blob: a2d51350289ab8d8f5f4ac10590e7411495b76f8 [file] [log] [blame]
Bram Moolenaarda2303d2005-08-30 21:55:26 +00001*spell.txt* For Vim version 7.0aa. Last change: 2005 Aug 30
Bram Moolenaar217ad922005-03-20 22:37:15 +00002
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7Spell checking *spell*
8
91. Quick start |spell-quickstart|
Bram Moolenaard042c562005-06-30 22:04:15 +0000102. Remarks on spell checking |spell-remarks|
113. Generating a spell file |spell-mkspell|
124. Spell file format |spell-file-format|
Bram Moolenaar217ad922005-03-20 22:37:15 +000013
14{Vi does not have any of these commands}
15
16Spell checking is not available when the |+syntax| feature has been disabled
17at compile time.
18
19==============================================================================
201. Quick start *spell-quickstart*
21
22This command switches on spell checking: >
23
24 :setlocal spell spelllang=en_us
25
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000026This switches on the 'spell' option and specifies to check for US English.
Bram Moolenaar217ad922005-03-20 22:37:15 +000027
28The words that are not recognized are highlighted with one of these:
Bram Moolenaar520470a2005-06-16 21:59:56 +000029 SpellBad word not recognized |hl-SpellBad|
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +000030 SpellCap word not capitalised |hl-SpellCap|
Bram Moolenaar520470a2005-06-16 21:59:56 +000031 SpellRare rare word |hl-SpellRare|
32 SpellLocal wrong spelling for selected region |hl-SpellLocal|
Bram Moolenaar217ad922005-03-20 22:37:15 +000033
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000034Vim only checks words for spelling, there is no grammar check.
35
Bram Moolenaar45360022005-07-21 21:08:21 +000036If the 'mousemodel' option is set to "popup" and the cursor is on a badly
37spelled word or it is "popup_setpos" and the mouse pointer is on a badly
38spelled word, then the popup menu will contain an submenu to replace the bad
39word. Note: this slows down the appearance of the popup menu.
40
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000041To search for the next misspelled word:
42
43 *]s* *E756*
44]s Move to next misspelled word after the cursor.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000045 A count before the command can be used to repeat.
Bram Moolenaarac6e65f2005-08-29 22:25:38 +000046 'wrapscan' applies.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000047
48 *[s*
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000049[s Like "]s" but search backwards, find the misspelled
Bram Moolenaar30abd282005-06-22 22:35:10 +000050 word before the cursor. Doesn't recognize words
51 split over two lines, thus may stop at words that are
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000052 not highlighted as bad. Does not stop at word with
53 missing capital at the start of a line.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000054
55 *]S*
56]S Like "]s" but only stop at bad words, not at rare
57 words or words for another region.
58
59 *[S*
60[S Like "]S" but search backwards.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000061
Bram Moolenaar217ad922005-03-20 22:37:15 +000062
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +000063To add words to your own word list: *E764*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000064
65 *zg*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000066zg Add word under the cursor as a good word to the first
Bram Moolenaarda2303d2005-08-30 21:55:26 +000067 name in 'spellfile'. A count may precede the command
68 to indicate the entry in 'spellfile' to be used. A
69 count of two uses the second entry.
70
71 In Visual mode the selected characters are added as a
72 word (including white space!).
73 When the cursor is on text that is marked as badly
74 spelled then the marked text is used.
75 Otherwise the word under the cursor, separated by
76 non-word characters, is used.
77
78 If the word is explicitly marked as bad word in
79 another spell file the result is unpredictable.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000080
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000081 *zG*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000082zG Like "zg" but add the word to the internal word list
83 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000084
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000085 *zw*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000086zw Like "zg" but mark the word as a wrong (bad) word.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000087
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000088 *zW*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000089zW Like "zw" but add the word to the internal word list
90 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000091
Bram Moolenaar520470a2005-06-16 21:59:56 +000092 *:spe* *:spellgood*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000093:[count]spe[llgood] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000094 Add {word} as a good word to 'spellfile', like with
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000095 "zg". Without count the first name is used, with a
96 count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000097
Bram Moolenaar53180ce2005-07-05 21:48:14 +000098:spe[llgood]! {word} Add {word} as a good word to the internal word list,
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000099 like with "zG".
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000100
Bram Moolenaar520470a2005-06-16 21:59:56 +0000101 *:spellw* *:spellwrong*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000102:[count]spellw[rong] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000103 Add {word} as a wrong (bad) word to 'spellfile', as
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000104 with "zw". Without count the first name is used, with
105 a count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000106
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000107:spellw[rong]! {word} Add {word} as a wrong (bad) word to the internal word
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000108 list.
109
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000110After adding a word to 'spellfile' with the above commands its associated
Bram Moolenaard042c562005-06-30 22:04:15 +0000111".spl" file will automatically be updated and reloaded. If you change
112'spellfile' manually you need to use the |:mkspell| command. This sequence of
113commands mostly works well: >
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000114 :edit <file in 'spellfile'>
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000115< (make changes to the spell file) >
116 :mkspell! %
117
118More details about the 'spellfile' format below |spell-wordlist-format|.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000119
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000120 *internal-wordlist*
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000121The internal word list is used for all buffers where 'spell' is set. It is
122not stored, it is lost when you exit Vim. It is also cleared when 'encoding'
123is set.
124
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000125
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000126Finding suggestions for bad words:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000127 *z?*
Bram Moolenaar30abd282005-06-22 22:35:10 +0000128z? For the word under/after the cursor suggest correctly
Bram Moolenaard042c562005-06-30 22:04:15 +0000129 spelled words. This also works to find alternatives
130 for a word that is not highlighted as a bad word,
131 e.g., when the word after it is bad.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000132 The results are sorted on similarity to the word
133 under/after the cursor.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000134 This may take a long time. Hit CTRL-C when you get
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000135 bored.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000136
137 If the command is used without a count the
138 alternatives are listed and you can enter the number
139 of your choice or press <Enter> if you don't want to
140 replace. You can also use the mouse to click on your
141 choice (only works if the mouse can be used in Normal
142 mode and when there are no line wraps). Click on the
143 first line (the header) to cancel.
144
145 If a count is used that suggestion is used, without
146 prompting. For example, "1z?" always takes the first
147 suggestion.
148
149 If 'verbose' is non-zero a score will be displayed
150 with the suggestions to indicate the likeliness to the
151 badly spelled word (the higher the score the more
152 different).
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000153 When a word was replaced the redo command "." will
154 repeat the word replacement. This works like "ciw",
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000155 the good word and <Esc>. This does NOT work for Thai
156 and other languages without spaces between words.
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000157
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000158 *:spellr* *:spellrepall* *E752* *E753*
159:spellr[epall] Repeat the replacement done by |z?| for all matches
160 with the replaced word in the current window.
161
Bram Moolenaar488c6512005-08-11 20:09:58 +0000162In Insert mode, when the cursor is after a badly spelled word, you can use
163CTRL-X s to find suggestions. This works like Insert mode completion. Use
164CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s|
165
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000166The 'spellsuggest' option influences how the list of suggestions is generated
167and sorted. See |'spellsuggest'|.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000168
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000169The 'spellcapcheck' option is used to check the first word of a sentence
170starts with a capital. This doesn't work for the first word in the file.
171When there is a line break right after a sentence the highlighting of the next
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000172line may be postponed. Use |CTRL-L| when needed. Also see |set-spc-auto| for
173how it can be set automatically when 'spelllang' is set.
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000174
Bram Moolenaard042c562005-06-30 22:04:15 +0000175==============================================================================
1762. Remarks on spell checking *spell-remarks*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000177
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000178PERFORMANCE
179
Bram Moolenaard042c562005-06-30 22:04:15 +0000180Vim does on-the-fly spell checking. To make this work fast the word list is
181loaded in memory. Thus this uses a lot of memory (1 Mbyte or more). There
182might also be a noticeable delay when the word list is loaded, which happens
183when 'spell' is set and when 'spelllang' is set while 'spell' was already set.
184To minimize the delay each word list is only loaded once, it is not deleted
185when 'spelllang' is made empty or 'spell' is reset. When 'encoding' is set
186all the word lists are reloaded, thus you may notice a delay then too.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000187
188
Bram Moolenaar217ad922005-03-20 22:37:15 +0000189REGIONS
190
191A word may be spelled differently in various regions. For example, English
192comes in (at least) these variants:
193
194 en all regions
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000195 en_au Australia
Bram Moolenaar217ad922005-03-20 22:37:15 +0000196 en_ca Canada
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000197 en_gb Great Britain
198 en_nz New Zealand
199 en_us USA
Bram Moolenaar217ad922005-03-20 22:37:15 +0000200
201Words that are not used in one region but are used in another region are
Bram Moolenaar520470a2005-06-16 21:59:56 +0000202highlighted with SpellLocal |hl-SpellLocal|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000203
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000204Always use lowercase letters for the language and region names.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000205
Bram Moolenaar3638c682005-06-08 22:05:14 +0000206When adding a word with |zg| or another command it's always added for all
207regions. You can change that by manually editing the 'spellfile'. See
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000208|spell-wordlist-format|. Note that the regions as specified in the files in
209'spellfile' are only used when all entries in "spelllang" specify the same
210region (not counting files specified by their .spl name).
Bram Moolenaar3638c682005-06-08 22:05:14 +0000211
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000212 *spell-german*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000213Specific exception: For German these special regions are used:
214 de all German words accepted
215 de_de old and new spelling
216 de_19 old spelling
217 de_20 new spelling
218 de_at Austria
219 de_ch Switzerland
220
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000221 *spell-yiddish*
222Yiddish requires using "utf-8" encoding, because of the special characters
223used. If you are using latin1 Vim will use transliterated (romanized) Yiddish
224instead. If you want to use transliterated Yiddish with utf-8 use "yi-tr".
225In a table:
226 'encoding' 'spelllang'
227 utf-8 yi Yiddish
228 latin1 yi transliterated Yiddish
229 utf-8 yi-tr transliterated Yiddish
230
Bram Moolenaar217ad922005-03-20 22:37:15 +0000231
Bram Moolenaar3b506942005-06-23 22:36:45 +0000232SPELL FILES *spell-load*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000233
234Vim searches for spell files in the "spell" subdirectory of the directories in
Bram Moolenaar3638c682005-06-08 22:05:14 +0000235'runtimepath'. The name is: LL.EEE.spl, where:
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000236 LL the language name
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000237 EEE the value of 'encoding'
Bram Moolenaar217ad922005-03-20 22:37:15 +0000238
Bram Moolenaar3b506942005-06-23 22:36:45 +0000239The value for "LL" comes from 'spelllang', but excludes the region name.
240Examples:
241 'spelllang' LL ~
242 en_us en
243 en-rare en-rare
244 medical_ca medical
245
Bram Moolenaar3638c682005-06-08 22:05:14 +0000246Only the first file is loaded, the one that is first in 'runtimepath'. If
247this succeeds then additionally files with the name LL.EEE.add.spl are loaded.
248All the ones that are found are used.
249
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000250Additionally, the files related to the names in 'spellfile' are loaded. These
251are the files that |zg| and |zw| add good and wrong words to.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000252
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000253Exceptions:
254- Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
255 matter for spelling.
256- When no spell file for 'encoding' is found "ascii" is tried. This only
257 works for languages where nearly all words are ASCII, such as English. It
258 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
Bram Moolenaar3638c682005-06-08 22:05:14 +0000259 is being edited. For the ".add" files the same name as the found main
260 spell file is used.
261
262For example, with these values:
263 'runtimepath' is "~/.vim,/usr/share/vim70,~/.vim/after"
264 'encoding' is "iso-8859-2"
265 'spelllang' is "pl"
266
267Vim will look for:
2681. ~/.vim/spell/pl.iso-8859-2.spl
2692. /usr/share/vim70/spell/pl.iso-8859-2.spl
2703. ~/.vim/spell/pl.iso-8859-2.add.spl
2714. /usr/share/vim70/spell/pl.iso-8859-2.add.spl
2725. ~/.vim/after/spell/pl.iso-8859-2.add.spl
273
274This assumes 1. is not found and 2. is found.
275
276If 'encoding' is "latin1" Vim will look for:
2771. ~/.vim/spell/pl.latin1.spl
2782. /usr/share/vim70/spell/pl.latin1.spl
2793. ~/.vim/after/spell/pl.latin1.spl
2804. ~/.vim/spell/pl.ascii.spl
2815. /usr/share/vim70/spell/pl.ascii.spl
2826. ~/.vim/after/spell/pl.ascii.spl
283
284This assumes none of them are found (Polish doesn't make sense when leaving
285out the non-ASCII characters).
Bram Moolenaar217ad922005-03-20 22:37:15 +0000286
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000287Spelling for EBCDIC is currently not supported.
288
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000289A spell file might not be available in the current 'encoding'. See
290|spell-mkspell| about how to create a spell file. Converting a spell file
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000291with "iconv" will NOT work!
Bram Moolenaar217ad922005-03-20 22:37:15 +0000292
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000293 *E758* *E759*
294When loading a spell file Vim checks that it is properly formatted. If you
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000295get an error the file may be truncated, modified or intended for another Vim
296version.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000297
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000298
299WORDS
300
301Vim uses a fixed method to recognize a word. This is independent of
302'iskeyword', so that it also works in help files and for languages that
303include characters like '-' in 'iskeyword'. The word characters do depend on
304'encoding'.
305
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000306The table with word characters is stored in the main .spl file. Therefore it
307matters what the current locale is when generating it! A .add.spl file does
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000308not contain a word table though.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000309
Bram Moolenaar3638c682005-06-08 22:05:14 +0000310A word that starts with a digit is always ignored. That includes hex numbers
311in the form 0xff and 0XFF.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000312
313
Bram Moolenaar30abd282005-06-22 22:35:10 +0000314WORD COMBINATIONS
315
316It is possible to spell-check words that include a space. This is used to
317recognize words that are invalid when used by themselves, e.g. for "et al.".
318It can also be used to recognize "the the" and highlight it.
319
320The number of spaces is irrelevant. In most cases a line break may also
321appear. However, this makes it difficult to find out where to start checking
322for spelling mistakes. When you make a change to one line and only that line
323is redrawn Vim won't look in the previous line, thus when "et" is at the end
324of the previous line "al." will be flagged as an error. And when you type
325"the<CR>the" the highlighting doesn't appear until the first line is redrawn.
326Use |CTRL-L| to redraw right away. "[s" will also stop at a word combination
327with a line break.
328
329When encountering a line break Vim skips characters such as '*', '>' and '"',
330so that comments in C, shell and Vim code can be spell checked.
331
332
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +0000333SYNTAX HIGHLIGHTING *spell-syntax*
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000334
335Files that use syntax highlighting can specify where spell checking should be
336done:
337
Bram Moolenaar3638c682005-06-08 22:05:14 +00003381. everywhere default
3392. in specific items use "contains=@Spell"
3403. everywhere but specific items use "contains=@NoSpell"
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000341
Bram Moolenaar3638c682005-06-08 22:05:14 +0000342For the second method adding the @NoSpell cluster will disable spell checking
343again. This can be used, for example, to add @Spell to the comments of a
344program, and add @NoSpell for items that shouldn't be checked.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000345
Bram Moolenaar30abd282005-06-22 22:35:10 +0000346
347VIM SCRIPTS
348
349If you want to write a Vim script that does something with spelling, you may
350find these functions useful:
351
352 spellbadword() find badly spelled word at the cursor
353 spellsuggest() get list of spelling suggestions
Bram Moolenaard042c562005-06-30 22:04:15 +0000354 soundfold() get the sound-a-like version of a word
Bram Moolenaar30abd282005-06-22 22:35:10 +0000355
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000356
357SETTING 'spellcapcheck' AUTOMATICALLY *set-spc-auto*
358
359After the 'spelllang' option has been set successfully, Vim will source the
360files "spell/LANG.vim" in 'runtimepath'. "LANG" is the value of 'spelllang'
361up to the first comma, dot or underscore. This can be used to set options
362specifically for the language, especially 'spellcapcheck'.
363
364The distribution includes a few of these files. Use this command to see what
365they do: >
366 :next $VIMRUNTIME/spell/*.vim
367
368Note that the default scripts don't set 'spellcapcheck' if it was changed from
369the default value. This assumes the user prefers another value then.
370
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000371
372DOUBLE SCORING *spell-double-scoring*
373
374The 'spellsuggest' option can be used to select "double" scoring. This
375mechanism is based on the principle that there are two kinds of spelling
376mistakes:
377
3781. You know how to spell the word, but mistype something. This results in a
379 small editing distance (character swapped/omitted/inserted) and possibly a
380 word that sounds completely different.
381
3822. You don't know how to spell the word and type something that sounds right.
383 The edit distance can be big but the word is similar after sound-folding.
384
385Since scores for these two mistakes will be very different we use a list
386for each and mix them.
387
388The sound-folding is slow and people that know the language won't make the
389second kind of mistakes. Therefore 'spellsuggest' can be set to select the
390preferred method for scoring the suggestions.
391
Bram Moolenaar217ad922005-03-20 22:37:15 +0000392==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00003933. Generating a spell file *spell-mkspell*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000394
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000395Vim uses a binary file format for spelling. This greatly speeds up loading
396the word list and keeps it small.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000397 *.aff* *.dic* *Myspell*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000398You can create a Vim spell file from the .aff and .dic files that Myspell
399uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to
400find them here:
401 http://lingucomponent.openoffice.org/spell_dic.html
Bram Moolenaar30abd282005-06-22 22:35:10 +0000402You can also use a plain word list. The results are the same, the choice
Bram Moolenaard042c562005-06-30 22:04:15 +0000403depends on what word lists you can find.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000404
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000405If you install Aap (from www.a-a-p.org) you can use the recipes in the
406runtime/spell/??/ directories. Aap will take care of downloading the files,
407apply patches needed for Vim and build the .spl file.
408
Bram Moolenaare13305e2005-06-19 22:54:15 +0000409Make sure your current locale is set properly, otherwise Vim doesn't know what
410characters are upper/lower case letters. If the locale isn't available (e.g.,
411when using an MS-Windows codepage on Unix) add tables to the .aff file
Bram Moolenaar3b506942005-06-23 22:36:45 +0000412|spell-affix-chars|. If the .aff file doesn't define a table then the word
413table of the currently active spelling is used. If spelling is not active
414then Vim will try to guess.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000415
Bram Moolenaar3b506942005-06-23 22:36:45 +0000416 *:mksp* *:mkspell*
417:mksp[ell][!] [-ascii] {outname} {inname} ...
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000418 Generate a Vim spell file word lists. Example: >
Bram Moolenaard042c562005-06-30 22:04:15 +0000419 :mkspell /tmp/nl nl_NL.words
Bram Moolenaar3b506942005-06-23 22:36:45 +0000420< *E751*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000421 When {outname} ends in ".spl" it is used as the output
422 file name. Otherwise it should be a language name,
Bram Moolenaar3b506942005-06-23 22:36:45 +0000423 such as "en", without the region name. The file
424 written will be "{outname}.{encoding}.spl", where
425 {encoding} is the value of the 'encoding' option.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000426
Bram Moolenaard042c562005-06-30 22:04:15 +0000427 When the output file already exists [!] must be used
Bram Moolenaar520470a2005-06-16 21:59:56 +0000428 to overwrite it.
429
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000430 When the [-ascii] argument is present, words with
431 non-ascii characters are skipped. The resulting file
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000432 ends in "ascii.spl".
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000433
434 The input can be the Myspell format files {inname}.aff
435 and {inname}.dic. If {inname}.aff does not exist then
436 {inname} is used as the file name of a plain word
437 list.
438
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000439 Multiple {inname} arguments can be given to combine
440 regions into one Vim spell file. Example: >
441 :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
442< This combines the English word lists for US, CA and AU
443 into one en.spl file.
444 Up to eight regions can be combined. *E754* *755*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000445 The REP and SAL items of the first .aff file where
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000446 they appear are used. |spell-REP| |spell-SAL|
Bram Moolenaar217ad922005-03-20 22:37:15 +0000447
Bram Moolenaar30abd282005-06-22 22:35:10 +0000448 This command uses a lot of memory, required to find
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000449 the optimal word tree (Polish, Italian and Hungarian
450 require several hundred Mbyte). The final result will
451 be much smaller, because compression is used. To
452 avoid running out of memory compression will be done
453 now and then. This can be tuned with the 'mkspellmem'
454 option.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000455
Bram Moolenaard042c562005-06-30 22:04:15 +0000456 After the spell file was written and it was being used
457 in a buffer it will be reloaded automatically.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000458
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000459:mksp[ell] [-ascii] {name}.{enc}.add
460 Like ":mkspell" above, using {name}.{enc}.add as the
Bram Moolenaard042c562005-06-30 22:04:15 +0000461 input file and producing an output file in the same
462 directory that has ".spl" appended.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000463
464:mksp[ell] [-ascii] {name}
465 Like ":mkspell" above, using {name} as the input file
Bram Moolenaard042c562005-06-30 22:04:15 +0000466 and producing an output file in the same directory
467 that has ".{enc}.spl" appended.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000468
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000469Vim will report the number of duplicate words. This might be a mistake in the
470list of words. But sometimes it is used to have different prefixes and
471suffixes for the same basic word to avoid them combining (e.g. Czech uses
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000472this). If you want Vim to report all duplicate words set the 'verbose'
473option.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000474
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000475Since you might want to change a Myspell word list for use with Vim the
476following procedure is recommended:
Bram Moolenaar217ad922005-03-20 22:37:15 +0000477
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00004781. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
4792. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
4803. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000481 words, define word characters with FOL/LOW/UPP, etc. The distributed
482 "src/spell/*.diff" files can be used.
Bram Moolenaard042c562005-06-30 22:04:15 +00004834. Start Vim with the right locale and use |:mkspell| to generate the Vim
484 spell file.
4855. Try out the spell file with ":set spell spelllang=xx" if you wrote it in
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000486 a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you
Bram Moolenaard042c562005-06-30 22:04:15 +0000487 wrote it somewhere else.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000488
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000489When the Myspell files are updated you can merge the differences:
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004901. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
4912. Use Vimdiff to see what changed: >
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000492 vimdiff xx_YY.orig.dic xx_YY.new.dic
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004933. Take over the changes you like in xx_YY.dic.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000494 You may also need to change xx_YY.aff.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004954. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000496
Bram Moolenaar3b506942005-06-23 22:36:45 +0000497
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000498SPELL FILE VERSIONS *E770* *E771* *E772*
499
500Spell checking is a relatively new feature in Vim, thus it's possible that the
501.spl file format will be changed to support more languages. Vim will check
502the validity of the spell file and report anything wrong.
503
504 E771: Old spell file, needs to be updated ~
505This spell file is older than your Vim. You need to update the .spl file.
506
507 E772: Spell file is for newer version of Vim ~
508This means the spell file was made for a later version of Vim. You need to
509update Vim.
510
511 E770: Unsupported section in spell file ~
512This means the spell file was made for a later version of Vim and contains a
513section that is required for the spell file to work. In this case it's
514probably a good idea to upgrade your Vim.
515
516
Bram Moolenaar3b506942005-06-23 22:36:45 +0000517SPELL FILE DUMP
518
519If for some reason you want to check what words are supported by the currently
520used spelling files, use this command:
521
522 *:spelldump* *:spelld*
523:spelld[ump] Open a new window and fill it with all currently valid
Bram Moolenaarac6e65f2005-08-29 22:25:38 +0000524 words. Compound words are not included.
Bram Moolenaard042c562005-06-30 22:04:15 +0000525 Note: For some languages the result may be enormous,
526 causing Vim to run out of memory.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000527
528The format of the word list is used |spell-wordlist-format|. You should be
529able to read it with ":mkspell" to generate one .spl file that includes all
530the words.
531
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000532When all entries to 'spelllang' use the same regions or no regions at all then
533the region information is included in the dumped words. Otherwise only words
534for the current region are included and no "/regions" line is generated.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000535
Bram Moolenaard042c562005-06-30 22:04:15 +0000536Comment lines with the name of the .spl file are used as a header above the
537words that were generated from that .spl file.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000538
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000539==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00005404. Spell file format *spell-file-format*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000541
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000542This is the format of the files that are used by the person who creates and
543maintains a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000544
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000545Note that we avoid the word "dictionary" here. That is because the goal of
546spell checking differs from writing a dictionary (as in the book). For
Bram Moolenaard042c562005-06-30 22:04:15 +0000547spelling we need a list of words that are OK, thus should not to be
548highlighted. Person and company names will not appear in a dictionary, but do
549appear in a word list. And some old words are rarely used while they are
550common misspellings. These do appear in a dictionary but not in a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000551
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000552There are two formats: A straight list of words and a list using affix
Bram Moolenaard042c562005-06-30 22:04:15 +0000553compression. The files with affix compression are used by Myspell (Mozilla
554and OpenOffice.org). This requires two files, one with .aff and one with .dic
555extension.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000556
557
Bram Moolenaard042c562005-06-30 22:04:15 +0000558FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format*
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000559
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000560The words must appear one per line. That is all that is required.
Bram Moolenaard042c562005-06-30 22:04:15 +0000561
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000562Additionally the following items are recognized:
Bram Moolenaard042c562005-06-30 22:04:15 +0000563
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000564- Empty and blank lines are ignored.
Bram Moolenaard042c562005-06-30 22:04:15 +0000565
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000566- Lines starting with a # are ignored (comment lines).
Bram Moolenaard042c562005-06-30 22:04:15 +0000567
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000568- A line starting with "/encoding=", before any word, specifies the encoding
569 of the file. After the second '=' comes an encoding name. This tells Vim
Bram Moolenaard042c562005-06-30 22:04:15 +0000570 to setup conversion from the specified encoding to 'encoding'. Thus you can
571 use one word list for several target encodings.
572
Bram Moolenaar3638c682005-06-08 22:05:14 +0000573- A line starting with "/regions=" specifies the region names that are
574 supported. Each region name must be two ASCII letters. The first one is
575 region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca".
Bram Moolenaard042c562005-06-30 22:04:15 +0000576 In an addition word list the region names should be equal to the main word
577 list!
578
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000579- Other lines starting with '/' are reserved for future use. The ones that
580 are not recognized are ignored (but you do get a warning message).
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000581
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000582- A "/" may follow the word with the following items:
583 = Case must match exactly.
584 ? Rare word.
585 ! Bad (wrong) word.
586 digit A region in which the word is valid. If no regions are
587 specified the word is valid in all regions.
588
Bram Moolenaar3638c682005-06-08 22:05:14 +0000589Example:
590
591 # This is an example word list comment
592 /encoding=latin1 encoding of the file
593 /regions=uscagb regions "us", "ca" and "gb"
594 example word for all regions
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000595 blah/12 word for regions "us" and "ca"
596 vim/! bad word
597 Campbell/?3 rare word in region 3 "gb"
598 's mornings/= keep-case word
Bram Moolenaar3638c682005-06-08 22:05:14 +0000599
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000600Note that when "/=" is used the same word with all upper-case letters is not
601accepted. This is different from a word with mixed case that is automatically
602marked as keep-case, those words may appear in all upper-case letters.
603
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000604
605FORMAT WITH AFFIX COMPRESSION
606
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000607There are two files: the basic word list and an affix file. The affixes are
608used to modify the basic words to get the full word list. This significantly
609reduces the number of words, especially for a language like Polish. This is
610called affix compression.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000611
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000612The basic word list and the affix file are combined and turned into a binary
613spell file. All the preprocessing has been done, thus this file loads fast.
614The binary spell file format is described in the source code (src/spell.c).
615But only developers need to know about it.
616
617The preprocessing also allows us to take the Myspell language files and modify
618them before the Vim word list is made. The tools for this can be found in the
619"src/spell" directory.
620
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000621The format for the affix and word list files is based on what Myspell uses
622(the spell checker of Mozilla and OpenOffice.org). A description can be found
623here:
624 http://lingucomponent.openoffice.org/affix.readme ~
625Note that affixes are case sensitive, this isn't obvious from the description.
626
627Vim does not use the TRY item, it is ignored. For making suggestions the
628possible characters in the words are used.
629
630Vim supports quite a few extras. They are described below |spell-affix-vim|.
631Attempts have been made to keep this compatible with other spell checkers, so
632that the same files can be used.
633
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000634
Bram Moolenaar3638c682005-06-08 22:05:14 +0000635WORD LIST FORMAT *spell-dic-format*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000636
637A very short example, with line numbers:
638
639 1 1234
640 2 aan
641 3 Als
642 4 Etten-Leur
643 5 et al.
644 6 's-Gravenhage
645 7 's-Gravenhaags
646 8 bedel/P
647 9 kado/1
648 10 cadeau/2
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000649 11 TCP,IP
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000650
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000651The first line contains the number of words. Vim ignores it, but you do get
652an error message if it's not there. *E760*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000653
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000654What follows is one word per line. There should be no white space before or
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000655after the word. After the word there is an optional slash and flags. Most of
656these flags are letters that indicate the affixes that can be used with this
657word. These are specified with SFX and PFX lines in the .aff file. See the
658Myspell documentation. Vim allows using other flag types with the FLAG item
659in the affix file |spell-FLAG|.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000660
661When the word only has lower-case letters it will also match with the word
662starting with an upper-case letter.
663
664When the word includes an upper-case letter, this means the upper-case letter
665is required at this position. The same word with a lower-case letter at this
666position will not match. When some of the other letters are upper-case it will
667not match either.
668
Bram Moolenaard042c562005-06-30 22:04:15 +0000669The word with all upper-case characters will always be OK.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000670
671 word list matches does not match ~
672 als als Als ALS ALs AlS aLs aLS
673 Als Als ALS als ALs AlS aLs aLS
674 ALS ALS als Als ALs AlS aLs aLS
675 AlS AlS ALS als Als ALs aLs aLS
676
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000677The KEP affix ID can be used to specifically match a word with identical case
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000678only, see below |spell-KEP|.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000679
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000680Note in line 5 to 7 that non-word characters are used. You can include
681any character in a word. When checking the text a word still only matches
682when it appears with a non-word character before and after it. For Myspell a
683word starting with a non-word character probably won't work.
684
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000685In line 12 the word "TCP/IP" is defined. Since the slash has a special
686meaning the comma is used instead. This is defined with the SLASH item in the
687affix file, see |spell-SLASH|. Note that without this SLASH item the
688word will be "TCP,IP".
689
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000690 *spell-affix-vim*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000691A flag that Vim adds and is not in Myspell is the flag defined with KEP in the
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000692affix file. This has the meaning that case matters. This can be used if the
693word does not have the first letter in upper case at the start of a sentence.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000694Example (assuming that = was used for KEP):
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000695
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000696 word list matches does not match ~
697 's morgens/= 's morgens 'S morgens 's Morgens 'S MORGENS
698 's Morgens 's Morgens 'S MORGENS 'S morgens 's morgens
699
700The flag can also be used to avoid that the word matches when it is in all
701upper-case letters.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000702
703 *spell-affix-mbyte*
704The basic word list is normally in an 8-bit encoding, which is mentioned in
705the affix file. The affix file must always be in the same encoding as the
706word list. This is compatible with Myspell. For Vim the encoding may also be
707something else, any encoding that "iconv" supports. The "SET" line must
708specify the name of the encoding. When using a multi-byte encoding it's
Bram Moolenaard042c562005-06-30 22:04:15 +0000709possible to use more different affixes (but Myspell doesn't support that, thus
710you may not want to use it anyway).
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000711
Bram Moolenaare13305e2005-06-19 22:54:15 +0000712
713CHARACTER TABLES
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000714 *spell-affix-chars*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000715When using an 8-bit encoding the affix file should define what characters are
716word characters (as specified with ENC). This is because the system where
717":mkspell" is used may not support a locale with this encoding and isalpha()
718won't work. For example when using "cp1250" on Unix.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000719
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000720 *E761* *E762* *spell-FOL*
721 *spell-LOW* *spell-UPP*
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000722Three lines in the affix file are needed. Simplistic example:
723
Bram Moolenaare13305e2005-06-19 22:54:15 +0000724 FOL áëñ ~
725 LOW áëñ ~
726 UPP ÁËÑ ~
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000727
728All three lines must have exactly the same number of characters.
729
730The "FOL" line specifies the case-folded characters. These are used to
731compare words while ignoring case. For most encodings this is identical to
732the lower case line.
733
734The "LOW" line specifies the characters in lower-case. Mostly it's equal to
735the "FOL" line.
736
737The "UPP" line specifies the characters with upper-case. That is, a character
738is upper-case where it's different from the character at the same position in
739"FOL".
740
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000741An exception is made for the German sharp s ß. The upper-case version is
742"SS". In the FOL/LOW/UPP lines it should be included, so that it's recognized
743as a word character, but use the ß character in all three.
744
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000745ASCII characters should be omitted, Vim always handles these in the same way.
746When the encoding is UTF-8 no word characters need to be specified.
747
748 *E763*
Bram Moolenaar3b506942005-06-23 22:36:45 +0000749Vim allows you to use spell checking for several languages in the same file.
750You can list them in the 'spelllang' option. As a consequence all spell files
751for the same encoding must use the same word characters, otherwise they can't
752be combined without errors. If you get a warning that the word tables differ
753you may need to generate the .spl file again with |:mkspell|. Check the FOL,
754LOW and UPP lines in the used .aff file.
755
756The XX.ascii.spl spell file generated with the "-ascii" argument will not
757contain the table with characters, so that it can be combine with spell files
758for any encoding. The .add.spl files also do not contain the table.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000759
Bram Moolenaare7566042005-06-17 22:00:15 +0000760
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000761MID-WORD CHARACTERS
762 *spell-midword*
763Some characters are only to be considered word characters if they are used in
764between two ordinary word characters. An example is the single quote: It is
765often used to put text in quotes, thus it can't be recognized as a word
766character, but when it appears in between word characters it must be part of
767the word. This is needed to detect a spelling error such as they'are. That
768should be they're, but since "they" and "are" are words themselves that would
769go unnoticed.
770
771These characters are defined with MIDWORD in the .aff file:
772
773 MIDWORD '- ~
774
775
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000776FLAG TYPES *spell-FLAG*
777
778Flags are used to specify the affixes that can be used with a word and for
779other properties of the word. Normally single-character flags are used. This
780limits the number of possible flags, especially for 8-bit encodings. The FLAG
781item can be used if more affixes are to be used. Possible values:
782
783 FLAG long use two-character flags
784 FLAG num use numbers, from 1 up to 65000
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000785 FLAG caplong use one-character flags without A-Z and two-character
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000786 flags that start with A-Z
787
788With "FLAG num" the numbers in a list of affixes need to be separated with a
789comma: "234,2143,1435". This method is inefficient, but useful if the file is
790generated with a program.
791
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000792When using "caplong" the two-character flags all start with a capital: "Aa",
793"B1", "BB", etc. This is useful to use one-character flags for the most
794common items and two-character flags for uncommon items.
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000795
796Note: When using utf-8 only characters up to 65000 may be used for flags.
797
798
Bram Moolenaare13305e2005-06-19 22:54:15 +0000799AFFIXES
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000800 *spell-PFX* *spell-SFX*
Bram Moolenaare13305e2005-06-19 22:54:15 +0000801The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000802documentation or the Aspell manual:
803http://aspell.net/man-html/Affix-Compression.html).
Bram Moolenaare13305e2005-06-19 22:54:15 +0000804
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000805Note that Myspell ignores any extra text after the relevant info. Vim
806requires this text to start with a "#" so that mistakes don't go unnoticed.
807Example:
808
809 SFX F 0 in [^i]n # Spion > Spionin ~
810 SFX F 0 nen in # Bauerin > Bauerinnen ~
811
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000812Apparently Myspell allows an affix name to appear more than once. Since this
813might also be a mistake, Vim checks for an extra "S". The affix files for
814Myspell that use this feature apparently have this flag. Example:
815
816 SFX a Y 1 S ~
817 SFX a 0 an . ~
818
819 SFX a Y 2 S ~
820 SFX a 0 en . ~
821 SFX a 0 on . ~
822
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000823 *spell-affix-rare*
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000824An extra item for Vim is the "rare" flag. It must come after the other
825fields, before a comment. When used then all words that use the affix will be
826marked as rare words. Example:
827
828 PFX F 0 nene . rare ~
829 SFX F 0 oin n rare # hardly ever used ~
830
831However, if the word also appears as a good word in another way it won't be
832marked as rare.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000833
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000834 *spell-affix-nocomp*
835Another extra item for Vim is the "nocomp" flag. It must come after the other
Bram Moolenaar90915b52005-08-21 22:17:52 +0000836fields, before a comment. It can be either before or after "rare". When
837present then all words that use the affix will not be part of a compound word.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000838Example:
839 affix file:
840 COMPOUNDFLAG c ~
841 SFX a Y 2 ~
842 SFX a 0 s . ~
843 SFX a 0 ize . nocomp ~
844 dictionary:
845 word/c ~
846 util/ac ~
847
848This allows for "wordutil" and "wordutils" but not "wordutilize".
849
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000850 *spell-PFXPOSTPONE*
Bram Moolenaare13305e2005-06-19 22:54:15 +0000851When an affix file has very many prefixes that apply to many words it's not
852possible to build the whole word list in memory. This applies to Hebrew (a
853list with all words is over a Gbyte). In that case applying prefixes must be
854postponed. This makes spell checking slower. It is indicated by this keyword
855in the .aff file:
856
857 PFXPOSTPONE ~
858
859Only prefixes without a chop string can be postponed, prefixes with a chop
Bram Moolenaar78984f52005-08-01 07:19:10 +0000860string will still be included in the word list. An exception if the chop
861string is one character and equal to the last character of the added string,
862but in lower case. Thus when the chop string is used to allow the following
863word to start with an upper case letter.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000864
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000865
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000866WORDS WITH A SLASH *spell-SLASH*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000867
868The slash is used in the .dic file to separate the basic word from the affix
869letters that can be used. Unfortunately, this means you cannot use a slash in
870a word. Thus "TCP/IP" cannot be a word. To work around that you can define a
871replacement character for the slash. Example:
872
873 SLASH , ~
874
875Now you can use "TCP,IP" to add the word "TCP/IP".
876
877Of course, the letter used should itself not appear in any word! The letter
878must be ASCII, thus a single byte.
879
880
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000881KEEP-CASE WORDS *spell-KEP*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000882
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000883In the affix file a KEP line can be used to define the affix name used for
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000884keep-case words. Example:
885
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000886 KEP = ~
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000887
888See above for an example |spell-affix-vim|.
889
Bram Moolenaare13305e2005-06-19 22:54:15 +0000890
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000891RARE WORDS *spell-RAR*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000892
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000893In the affix file a RAR line can be used to define the affix name used for
894rare words. Example:
895
896 RAR ? ~
897
898Rare words are highlighted differently from bad words. This is to be used for
899words that are correct for the language, but are hardly ever used and could be
Bram Moolenaar30abd282005-06-22 22:35:10 +0000900a typing mistake anyway. When the same word is found as good it won't be
901highlighted as rare.
902
903
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000904BAD WORDS *spell-BAD*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000905
Bram Moolenaar30abd282005-06-22 22:35:10 +0000906In the affix file a BAD line can be used to define the affix name used for
907bad words. Example:
908
909 BAD ! ~
910
911This can be used to exclude words that would otherwise be good. For example
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000912"the the" in the .dic file:
913
914 the the/! ~
915
916Once a word has been marked as bad it won't be undone by encountering the same
917word as good.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000918
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000919 *spell-NEEDAFFIX*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000920The NEEDAFFIX flag is used to require that a word is used with an affix. The
921word itself is not a good word. Example:
922
923 NEEDAFFIX + ~
924
Bram Moolenaarac6e65f2005-08-29 22:25:38 +0000925 *spell-NEEDCOMPOUND*
926The NEEDCOMPOUND flag is used to require that a word is used as part of a
927compound word The word itself is not a good word. Example:
928
929 NEEDCOMPOUND & ~
930
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000931
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000932COMPOUND WORDS *spell-compound*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000933
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000934A compound word is a longer word made by concatenating words that appear in
935the .dic file. To specify which words may be concatenated a character is
936used. This character is put in the list of affixes after the word. We will
937call this character a flag here. Obviously these flags must be different from
938any affix IDs used.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000939
940 *spell-COMPOUNDFLAG*
941The Myspell compatible method uses one flag, specified with COMPOUNDFLAG.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000942All words with this flag combine in any order. This means there is no control
943over which word comes first. Example:
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000944 COMPOUNDFLAG c ~
945
946 *spell-COMPOUNDFLAGS*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000947A more advanced method to specify how compound words can be formed uses
948multiple items with multiple flags. This is not compatible with Myspell 3.0.
949Let's start with an example:
950 COMPOUNDFLAGS c+ ~
951 COMPOUNDFLAGS se ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000952
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000953The first line defines that words with the "c" flag can be concatenated in any
954order. The second line defines compound words that are made of one word with
955the "s" flag and one word with the "e" flag. With this dictionary:
956 bork/c ~
957 onion/s ~
958 soup/e ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000959
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000960You can make these words:
961 bork
962 borkbork
963 borkborkbork
964 (etc.)
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000965 onion
966 soup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000967 onionsoup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000968
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000969The COMPOUNDFLAGS item may appear multiple times. The argument is made out of
970one or more groups, where each group can be:
971 one flag e.g., c
972 alternate flags inside [] e.g., [abc]
973Optionally this may be followed by:
974 * the group appears zero or more times, e.g., sm*e
975 + the group appears one or more times, e.g., c+
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000976
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000977This is similar to the regexp pattern syntax (but not the same!). A few
978examples with the sequence of word flags they require:
979 COMPOUNDFLAGS x+ x xx xxx etc.
980 COMPOUNDFLAGS yz yz
981 COMPOUNDFLAGS x+z xz xxz xxxz etc.
982 COMPOUNDFLAGS yx+ yx yxx yxxx etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000983
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000984 COMPOUNDFLAGS [abc]z az bz cz
985 COMPOUNDFLAGS [abc]+z az aaz abaz bz baz bcbz cz caz cbaz etc.
986 COMPOUNDFLAGS a[xyz]+ ax axx axyz ay ayx ayzz az azy azxy etc.
987 COMPOUNDFLAGS sm*e se sme smme smmme etc.
988 COMPOUNDFLAGS s[xyz]*e se sxe sxye sxyxe sye syze sze szye szyxe etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000989
Bram Moolenaara6c840d2005-08-22 22:59:46 +0000990A specific example: Allow a compound to be made of two words and a dash:
991 In the .aff file:
992 COMPOUNDFLAGS sde ~
993 NEEDAFFIX x ~
994 COMPOUNDMAX 3 ~
995 COMPOUNDMIN 1 ~
996 In the .dic file:
997 start/s ~
998 end/e ~
999 -/xd ~
1000
1001This allows for the word "start-end", but not "startend".
1002
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001003 *spell-COMPOUNDMIN*
Bram Moolenaarac6e65f2005-08-29 22:25:38 +00001004The minimal character length of a word used for compounding is specified with
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001005COMPOUNDMIN. Example:
1006 COMPOUNDMIN 5 ~
1007
Bram Moolenaarac6e65f2005-08-29 22:25:38 +00001008When omitted there is no minimal length. Obviously you could just leave out
1009the compound flag from short words instead, this feature is present for
1010compatibility with Myspell.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001011
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001012 *spell-COMPOUNDMAX*
1013The maximum number of words that can be concatenated into a compound word is
1014specified with COMPOUNDMAX. Example:
1015 COMPOUNDMAX 3 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001016
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001017When omitted there is no maximum. It applies to all compound words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001018
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001019To set a limit for words with specific flags make sure the items in
1020COMPOUNDFLAGS where they appear don't allow too many words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001021
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001022 *spell-COMPOUNDSYLMAX*
1023The maximum number of syllables that a compound word may contain is specified
1024with COMPOUNDSYLMAX. Example:
1025 COMPOUNDSYLMAX 6 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001026
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001027This has no effect if there is no SYLLABLE item. Without COMPOUNDSYLMAX there
1028is no limit on the number of syllables.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001029
Bram Moolenaara6c840d2005-08-22 22:59:46 +00001030If both COMPOUNDMAX and COMPOUNDSYLMAX are defined, a compound word is
1031accepted if it fits one of the criteria, thus is either made from up to
1032COMPOUNDMAX words or contains up to COMPOUNDSYLMAX syllables.
1033
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001034 *spell-SYLLABLE*
1035The SYLLABLE item defines characters or character sequences that are used to
1036count the number of syllables in a word. Example:
1037 SYLLABLE aáeéiíoóöõuúüûy/aa/au/ea/ee/ei/ie/oa/oe/oo/ou/uu/ui ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001038
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001039Before the first slash is the set of characters that are counted for one
1040syllable, also when repeated and mixed, until the next character that is not
1041in this set. After the slash come sequences of characters that are counted
1042for one syllable. These are preferred over using characters from the set.
1043With the example "ideeen" has three syllables, counted by "i", "ee" and "e".
1044
1045Only case-folded letters need to be included.
1046
1047Above another way to restrict compounding was mentioned above: adding "nocomp"
1048after an affix causes all words that are made with that affix not be be used
1049for compounding. |spell-affix-nocomp|
1050
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001051
1052UNLIMITED COMPOUNDING *spell-NOBREAK*
1053
1054For some languages, such as Thai, there is no space in between words. This
1055looks like all words are compounded. To specify this use the NOBREAK item in
1056the affix file, without arguments:
1057 NOBREAK ~
1058
1059Vim will try to figure out where one word ends and a next starts. When there
1060are spelling mistakes this may not be quite right.
1061
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001062>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
1063NOTE: The following has not been implemented yet, because there are no word
1064lists that support this.
1065> *spell-CMP*
1066> Sometimes it is necessary to change a word when concatenating it to another,
1067> by removing a few letters, inserting something or both. It can also be useful
1068> to restrict concatenation to words that match a pattern. For this purpose CMP
1069> items can be used. They look like this:
1070> CMP {flag} {flags} {strip} {strip2} {add} {cond} {cond2}
1071>
1072> {flag} the flag, as used in COMPOUNDFLAGS for the lead word
1073> {flags} accepted flags for the following word ('.' to accept
1074> all)
1075> {strip} text to remove from the end of the lead word (zero
1076> for no stripping)
1077> {strip2} text to remove from the start of the following word
1078> (zero for no stripping)
1079> {add} text to insert between the words (zero for no
1080> addition)
1081> {cond} condition to match at the end of the lead word
1082> {cond2} condition to match at the start of the following word
1083>
1084> This is the same as what is used for SFX and PFX items, with the extra {flags}
1085> and {cond2} fields. Example:
1086> CMP f mrt 0 - . . ~
1087>
1088> When used with the food and dish word list above, this means that a dash is
1089> inserted after each food item. Thus you get "onion-soup" and
1090> "onion-tomato-salat".
1091>
1092> When there are CMP items for a compound flag the concatenation is only done
1093> when a CMP item matches.
1094>
1095> When there are no CMP items for a compound flag, then all words will be
1096> concatenated, as if there was an item:
1097> CMP {flag} . 0 0 . .
1098>
1099>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001100
1101
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001102REPLACEMENTS *spell-REP*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001103
1104In the affix file REP items can be used to define common mistakes. This is
1105used to make spelling suggestions. The items define the "from" text and the
1106"to" replacement. Example:
1107
1108 REP 4 ~
1109 REP f ph ~
1110 REP ph f ~
1111 REP k ch ~
1112 REP ch k ~
1113
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001114The first line specifies the number of REP lines following. Vim ignores the
1115number, but it must be there.
1116
Bram Moolenaard042c562005-06-30 22:04:15 +00001117Don't include simple one-character replacements or swaps. Vim will try these
1118anyway. You can include whole words if you want to, but you might want to use
1119the "file:" item in 'spellsuggest' instead.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001120
1121
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001122SIMILAR CHARACTERS *spell-MAP*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001123
Bram Moolenaard042c562005-06-30 22:04:15 +00001124In the affix file MAP items can be used to define letters that are very much
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001125alike. This is mostly used for a letter with different accents. This is used
1126to prefer suggestions with these letters substituted. Example:
1127
1128 MAP 2 ~
1129 MAP eéëêè ~
1130 MAP uüùúû ~
1131
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001132The first line specifies the number of MAP lines following. Vim ignores the
1133number, but the line must be there.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001134
Bram Moolenaard042c562005-06-30 22:04:15 +00001135Each letter must appear in only one of the MAP items. It's a bit more
1136efficient if the first letter is ASCII or at least one without accents.
Bram Moolenaare7566042005-06-17 22:00:15 +00001137
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001138
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001139SOUND-A-LIKE *spell-SAL*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001140
1141In the affix file SAL items can be used to define the sounds-a-like mechanism
1142to be used. The main items define the "from" text and the "to" replacement.
Bram Moolenaard042c562005-06-30 22:04:15 +00001143Simplistic example:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001144
1145 SAL CIA X ~
1146 SAL CH X ~
1147 SAL C K ~
1148 SAL K K ~
1149
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001150There are a few rules and this can become quite complicated. An explanation
Bram Moolenaard042c562005-06-30 22:04:15 +00001151how it works can be found in the Aspell manual:
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001152http://aspell.net/man-html/Phonetic-Code.html.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001153
1154There are a few special items:
1155
1156 SAL followup true ~
1157 SAL collapse_result true ~
1158 SAL remove_accents true ~
1159
1160"1" has the same meaning as "true". Any other value means "false".
1161
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001162
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001163SIMPLE SOUNDFOLDING *spell-SOFOFROM* *spell-SOFOTO*
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001164
1165The SAL mechanism is complex and slow. A simpler mechanism is mapping all
1166characters to another character, mapping similar sounding characters to the
1167same character. At the same time this does case folding. You can not have
Bram Moolenaard042c562005-06-30 22:04:15 +00001168both SAL items and simple soundfolding.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001169
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001170There are two items required: one to specify the characters that are mapped
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001171and one that specifies the characters they are mapped to. They must have
1172exactly the same number of characters. Example:
1173
1174 SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~
1175 SOFOTO ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~
1176
1177In the example all vowels are mapped to the same character 'e'. Another
Bram Moolenaard042c562005-06-30 22:04:15 +00001178method would be to leave out all vowels. Some characters that sound nearly
1179the same and are often mixed up, such as 'm' and 'n', are mapped to the same
1180character. Don't do this too much, all words will start looking alike.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001181
1182Characters that do not appear in SOFOFROM will be left out, except that all
1183white space is replaced by one space. Sequences of the same character in
1184SOFOFROM are replaced by one.
1185
1186You can use the |soundfold()| function to try out the results. Or set the
1187'verbose' option to see the score in the output of the |z?| command.
1188
1189
Bram Moolenaar217ad922005-03-20 22:37:15 +00001190 vim:tw=78:sw=4:ts=8:ft=help:norl: