blob: d5193a5ef6b47aaea0c6d5dd8d83da61aff5f1fa [file] [log] [blame]
Bram Moolenaarcc016f52005-12-10 20:23:46 +00001*spell.txt* For Vim version 7.0aa. Last change: 2005 Dec 09
Bram Moolenaar217ad922005-03-20 22:37:15 +00002
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7Spell checking *spell*
8
91. Quick start |spell-quickstart|
Bram Moolenaard042c562005-06-30 22:04:15 +0000102. Remarks on spell checking |spell-remarks|
113. Generating a spell file |spell-mkspell|
124. Spell file format |spell-file-format|
Bram Moolenaar217ad922005-03-20 22:37:15 +000013
14{Vi does not have any of these commands}
15
16Spell checking is not available when the |+syntax| feature has been disabled
17at compile time.
18
19==============================================================================
201. Quick start *spell-quickstart*
21
22This command switches on spell checking: >
23
24 :setlocal spell spelllang=en_us
25
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000026This switches on the 'spell' option and specifies to check for US English.
Bram Moolenaar217ad922005-03-20 22:37:15 +000027
28The words that are not recognized are highlighted with one of these:
Bram Moolenaar520470a2005-06-16 21:59:56 +000029 SpellBad word not recognized |hl-SpellBad|
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +000030 SpellCap word not capitalised |hl-SpellCap|
Bram Moolenaar520470a2005-06-16 21:59:56 +000031 SpellRare rare word |hl-SpellRare|
32 SpellLocal wrong spelling for selected region |hl-SpellLocal|
Bram Moolenaar217ad922005-03-20 22:37:15 +000033
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000034Vim only checks words for spelling, there is no grammar check.
35
Bram Moolenaar45360022005-07-21 21:08:21 +000036If the 'mousemodel' option is set to "popup" and the cursor is on a badly
37spelled word or it is "popup_setpos" and the mouse pointer is on a badly
Bram Moolenaar16d8f872005-11-26 23:46:11 +000038spelled word, then the popup menu will contain a submenu to replace the bad
Bram Moolenaar45360022005-07-21 21:08:21 +000039word. Note: this slows down the appearance of the popup menu.
40
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000041To search for the next misspelled word:
42
43 *]s* *E756*
44]s Move to next misspelled word after the cursor.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000045 A count before the command can be used to repeat.
Bram Moolenaarac6e65f2005-08-29 22:25:38 +000046 'wrapscan' applies.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000047
48 *[s*
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000049[s Like "]s" but search backwards, find the misspelled
Bram Moolenaar30abd282005-06-22 22:35:10 +000050 word before the cursor. Doesn't recognize words
51 split over two lines, thus may stop at words that are
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000052 not highlighted as bad. Does not stop at word with
53 missing capital at the start of a line.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000054
55 *]S*
56]S Like "]s" but only stop at bad words, not at rare
57 words or words for another region.
58
59 *[S*
60[S Like "]S" but search backwards.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000061
Bram Moolenaar217ad922005-03-20 22:37:15 +000062
Bram Moolenaarf75a9632005-09-13 21:20:47 +000063To add words to your own word list:
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000064
65 *zg*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000066zg Add word under the cursor as a good word to the first
Bram Moolenaarda2303d2005-08-30 21:55:26 +000067 name in 'spellfile'. A count may precede the command
68 to indicate the entry in 'spellfile' to be used. A
69 count of two uses the second entry.
70
71 In Visual mode the selected characters are added as a
72 word (including white space!).
73 When the cursor is on text that is marked as badly
74 spelled then the marked text is used.
75 Otherwise the word under the cursor, separated by
76 non-word characters, is used.
77
78 If the word is explicitly marked as bad word in
79 another spell file the result is unpredictable.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000080
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000081 *zG*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000082zG Like "zg" but add the word to the internal word list
83 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000084
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000085 *zw*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000086zw Like "zg" but mark the word as a wrong (bad) word.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000087
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000088 *zW*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000089zW Like "zw" but add the word to the internal word list
90 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000091
Bram Moolenaar520470a2005-06-16 21:59:56 +000092 *:spe* *:spellgood*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000093:[count]spe[llgood] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000094 Add {word} as a good word to 'spellfile', like with
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000095 "zg". Without count the first name is used, with a
96 count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000097
Bram Moolenaar53180ce2005-07-05 21:48:14 +000098:spe[llgood]! {word} Add {word} as a good word to the internal word list,
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000099 like with "zG".
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000100
Bram Moolenaar520470a2005-06-16 21:59:56 +0000101 *:spellw* *:spellwrong*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000102:[count]spellw[rong] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000103 Add {word} as a wrong (bad) word to 'spellfile', as
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000104 with "zw". Without count the first name is used, with
105 a count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000106
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000107:spellw[rong]! {word} Add {word} as a wrong (bad) word to the internal word
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000108 list.
109
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000110After adding a word to 'spellfile' with the above commands its associated
Bram Moolenaard042c562005-06-30 22:04:15 +0000111".spl" file will automatically be updated and reloaded. If you change
112'spellfile' manually you need to use the |:mkspell| command. This sequence of
113commands mostly works well: >
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000114 :edit <file in 'spellfile'>
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000115< (make changes to the spell file) >
116 :mkspell! %
117
118More details about the 'spellfile' format below |spell-wordlist-format|.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000119
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000120 *internal-wordlist*
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000121The internal word list is used for all buffers where 'spell' is set. It is
122not stored, it is lost when you exit Vim. It is also cleared when 'encoding'
123is set.
124
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000125
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000126Finding suggestions for bad words:
Bram Moolenaarcc016f52005-12-10 20:23:46 +0000127 *z=*
128z= For the word under/after the cursor suggest correctly
Bram Moolenaard042c562005-06-30 22:04:15 +0000129 spelled words. This also works to find alternatives
130 for a word that is not highlighted as a bad word,
131 e.g., when the word after it is bad.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000132 The results are sorted on similarity to the word
133 under/after the cursor.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000134 This may take a long time. Hit CTRL-C when you get
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000135 bored.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000136
137 If the command is used without a count the
138 alternatives are listed and you can enter the number
139 of your choice or press <Enter> if you don't want to
140 replace. You can also use the mouse to click on your
141 choice (only works if the mouse can be used in Normal
142 mode and when there are no line wraps). Click on the
143 first line (the header) to cancel.
144
145 If a count is used that suggestion is used, without
Bram Moolenaarcc016f52005-12-10 20:23:46 +0000146 prompting. For example, "1z=" always takes the first
Bram Moolenaar90915b52005-08-21 22:17:52 +0000147 suggestion.
148
149 If 'verbose' is non-zero a score will be displayed
150 with the suggestions to indicate the likeliness to the
151 badly spelled word (the higher the score the more
152 different).
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000153 When a word was replaced the redo command "." will
154 repeat the word replacement. This works like "ciw",
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000155 the good word and <Esc>. This does NOT work for Thai
156 and other languages without spaces between words.
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000157
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000158 *:spellr* *:spellrepall* *E752* *E753*
Bram Moolenaarcc016f52005-12-10 20:23:46 +0000159:spellr[epall] Repeat the replacement done by |z=| for all matches
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000160 with the replaced word in the current window.
161
Bram Moolenaar488c6512005-08-11 20:09:58 +0000162In Insert mode, when the cursor is after a badly spelled word, you can use
163CTRL-X s to find suggestions. This works like Insert mode completion. Use
164CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s|
165
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000166The 'spellsuggest' option influences how the list of suggestions is generated
167and sorted. See |'spellsuggest'|.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000168
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000169The 'spellcapcheck' option is used to check the first word of a sentence
170starts with a capital. This doesn't work for the first word in the file.
171When there is a line break right after a sentence the highlighting of the next
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000172line may be postponed. Use |CTRL-L| when needed. Also see |set-spc-auto| for
173how it can be set automatically when 'spelllang' is set.
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000174
Bram Moolenaard042c562005-06-30 22:04:15 +0000175==============================================================================
1762. Remarks on spell checking *spell-remarks*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000177
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000178PERFORMANCE
179
Bram Moolenaard042c562005-06-30 22:04:15 +0000180Vim does on-the-fly spell checking. To make this work fast the word list is
181loaded in memory. Thus this uses a lot of memory (1 Mbyte or more). There
182might also be a noticeable delay when the word list is loaded, which happens
183when 'spell' is set and when 'spelllang' is set while 'spell' was already set.
184To minimize the delay each word list is only loaded once, it is not deleted
185when 'spelllang' is made empty or 'spell' is reset. When 'encoding' is set
186all the word lists are reloaded, thus you may notice a delay then too.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000187
188
Bram Moolenaar217ad922005-03-20 22:37:15 +0000189REGIONS
190
191A word may be spelled differently in various regions. For example, English
192comes in (at least) these variants:
193
194 en all regions
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000195 en_au Australia
Bram Moolenaar217ad922005-03-20 22:37:15 +0000196 en_ca Canada
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000197 en_gb Great Britain
198 en_nz New Zealand
199 en_us USA
Bram Moolenaar217ad922005-03-20 22:37:15 +0000200
201Words that are not used in one region but are used in another region are
Bram Moolenaar520470a2005-06-16 21:59:56 +0000202highlighted with SpellLocal |hl-SpellLocal|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000203
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000204Always use lowercase letters for the language and region names.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000205
Bram Moolenaar3638c682005-06-08 22:05:14 +0000206When adding a word with |zg| or another command it's always added for all
207regions. You can change that by manually editing the 'spellfile'. See
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000208|spell-wordlist-format|. Note that the regions as specified in the files in
Bram Moolenaar16d8f872005-11-26 23:46:11 +0000209'spellfile' are only used when all entries in 'spelllang' specify the same
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000210region (not counting files specified by their .spl name).
Bram Moolenaar3638c682005-06-08 22:05:14 +0000211
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000212 *spell-german*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000213Specific exception: For German these special regions are used:
214 de all German words accepted
215 de_de old and new spelling
216 de_19 old spelling
217 de_20 new spelling
218 de_at Austria
219 de_ch Switzerland
220
Bram Moolenaar92d640f2005-09-05 22:11:52 +0000221 *spell-russian*
222Specific exception: For Russian these special regions are used:
223 ru all Russian words accepted
224 ru_ru "IE" letter spelling
225 ru_yo "YO" letter spelling
226
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000227 *spell-yiddish*
228Yiddish requires using "utf-8" encoding, because of the special characters
229used. If you are using latin1 Vim will use transliterated (romanized) Yiddish
230instead. If you want to use transliterated Yiddish with utf-8 use "yi-tr".
231In a table:
232 'encoding' 'spelllang'
233 utf-8 yi Yiddish
234 latin1 yi transliterated Yiddish
235 utf-8 yi-tr transliterated Yiddish
236
Bram Moolenaar217ad922005-03-20 22:37:15 +0000237
Bram Moolenaar3b506942005-06-23 22:36:45 +0000238SPELL FILES *spell-load*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000239
240Vim searches for spell files in the "spell" subdirectory of the directories in
Bram Moolenaar3638c682005-06-08 22:05:14 +0000241'runtimepath'. The name is: LL.EEE.spl, where:
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000242 LL the language name
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000243 EEE the value of 'encoding'
Bram Moolenaar217ad922005-03-20 22:37:15 +0000244
Bram Moolenaar3b506942005-06-23 22:36:45 +0000245The value for "LL" comes from 'spelllang', but excludes the region name.
246Examples:
247 'spelllang' LL ~
248 en_us en
249 en-rare en-rare
250 medical_ca medical
251
Bram Moolenaar3638c682005-06-08 22:05:14 +0000252Only the first file is loaded, the one that is first in 'runtimepath'. If
253this succeeds then additionally files with the name LL.EEE.add.spl are loaded.
254All the ones that are found are used.
255
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000256Additionally, the files related to the names in 'spellfile' are loaded. These
257are the files that |zg| and |zw| add good and wrong words to.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000258
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000259Exceptions:
260- Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
261 matter for spelling.
262- When no spell file for 'encoding' is found "ascii" is tried. This only
263 works for languages where nearly all words are ASCII, such as English. It
264 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
Bram Moolenaar3638c682005-06-08 22:05:14 +0000265 is being edited. For the ".add" files the same name as the found main
266 spell file is used.
267
268For example, with these values:
269 'runtimepath' is "~/.vim,/usr/share/vim70,~/.vim/after"
270 'encoding' is "iso-8859-2"
271 'spelllang' is "pl"
272
273Vim will look for:
2741. ~/.vim/spell/pl.iso-8859-2.spl
2752. /usr/share/vim70/spell/pl.iso-8859-2.spl
2763. ~/.vim/spell/pl.iso-8859-2.add.spl
2774. /usr/share/vim70/spell/pl.iso-8859-2.add.spl
2785. ~/.vim/after/spell/pl.iso-8859-2.add.spl
279
280This assumes 1. is not found and 2. is found.
281
282If 'encoding' is "latin1" Vim will look for:
2831. ~/.vim/spell/pl.latin1.spl
2842. /usr/share/vim70/spell/pl.latin1.spl
2853. ~/.vim/after/spell/pl.latin1.spl
2864. ~/.vim/spell/pl.ascii.spl
2875. /usr/share/vim70/spell/pl.ascii.spl
2886. ~/.vim/after/spell/pl.ascii.spl
289
290This assumes none of them are found (Polish doesn't make sense when leaving
291out the non-ASCII characters).
Bram Moolenaar217ad922005-03-20 22:37:15 +0000292
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000293Spelling for EBCDIC is currently not supported.
294
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000295A spell file might not be available in the current 'encoding'. See
296|spell-mkspell| about how to create a spell file. Converting a spell file
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000297with "iconv" will NOT work!
Bram Moolenaar217ad922005-03-20 22:37:15 +0000298
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000299 *E758* *E759*
300When loading a spell file Vim checks that it is properly formatted. If you
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000301get an error the file may be truncated, modified or intended for another Vim
302version.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000303
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000304
305WORDS
306
307Vim uses a fixed method to recognize a word. This is independent of
308'iskeyword', so that it also works in help files and for languages that
309include characters like '-' in 'iskeyword'. The word characters do depend on
310'encoding'.
311
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000312The table with word characters is stored in the main .spl file. Therefore it
313matters what the current locale is when generating it! A .add.spl file does
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000314not contain a word table though.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000315
Bram Moolenaar3638c682005-06-08 22:05:14 +0000316A word that starts with a digit is always ignored. That includes hex numbers
317in the form 0xff and 0XFF.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000318
319
Bram Moolenaar30abd282005-06-22 22:35:10 +0000320WORD COMBINATIONS
321
322It is possible to spell-check words that include a space. This is used to
323recognize words that are invalid when used by themselves, e.g. for "et al.".
324It can also be used to recognize "the the" and highlight it.
325
326The number of spaces is irrelevant. In most cases a line break may also
327appear. However, this makes it difficult to find out where to start checking
328for spelling mistakes. When you make a change to one line and only that line
329is redrawn Vim won't look in the previous line, thus when "et" is at the end
330of the previous line "al." will be flagged as an error. And when you type
331"the<CR>the" the highlighting doesn't appear until the first line is redrawn.
332Use |CTRL-L| to redraw right away. "[s" will also stop at a word combination
333with a line break.
334
335When encountering a line break Vim skips characters such as '*', '>' and '"',
336so that comments in C, shell and Vim code can be spell checked.
337
338
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +0000339SYNTAX HIGHLIGHTING *spell-syntax*
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000340
341Files that use syntax highlighting can specify where spell checking should be
342done:
343
Bram Moolenaar3638c682005-06-08 22:05:14 +00003441. everywhere default
3452. in specific items use "contains=@Spell"
3463. everywhere but specific items use "contains=@NoSpell"
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000347
Bram Moolenaar3638c682005-06-08 22:05:14 +0000348For the second method adding the @NoSpell cluster will disable spell checking
349again. This can be used, for example, to add @Spell to the comments of a
350program, and add @NoSpell for items that shouldn't be checked.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000351
Bram Moolenaar30abd282005-06-22 22:35:10 +0000352
353VIM SCRIPTS
354
355If you want to write a Vim script that does something with spelling, you may
356find these functions useful:
357
358 spellbadword() find badly spelled word at the cursor
359 spellsuggest() get list of spelling suggestions
Bram Moolenaard042c562005-06-30 22:04:15 +0000360 soundfold() get the sound-a-like version of a word
Bram Moolenaar30abd282005-06-22 22:35:10 +0000361
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000362
363SETTING 'spellcapcheck' AUTOMATICALLY *set-spc-auto*
364
365After the 'spelllang' option has been set successfully, Vim will source the
366files "spell/LANG.vim" in 'runtimepath'. "LANG" is the value of 'spelllang'
367up to the first comma, dot or underscore. This can be used to set options
368specifically for the language, especially 'spellcapcheck'.
369
370The distribution includes a few of these files. Use this command to see what
371they do: >
372 :next $VIMRUNTIME/spell/*.vim
373
374Note that the default scripts don't set 'spellcapcheck' if it was changed from
375the default value. This assumes the user prefers another value then.
376
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000377
378DOUBLE SCORING *spell-double-scoring*
379
380The 'spellsuggest' option can be used to select "double" scoring. This
381mechanism is based on the principle that there are two kinds of spelling
382mistakes:
383
3841. You know how to spell the word, but mistype something. This results in a
385 small editing distance (character swapped/omitted/inserted) and possibly a
386 word that sounds completely different.
387
3882. You don't know how to spell the word and type something that sounds right.
389 The edit distance can be big but the word is similar after sound-folding.
390
391Since scores for these two mistakes will be very different we use a list
392for each and mix them.
393
394The sound-folding is slow and people that know the language won't make the
395second kind of mistakes. Therefore 'spellsuggest' can be set to select the
396preferred method for scoring the suggestions.
397
Bram Moolenaar217ad922005-03-20 22:37:15 +0000398==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00003993. Generating a spell file *spell-mkspell*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000400
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000401Vim uses a binary file format for spelling. This greatly speeds up loading
402the word list and keeps it small.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000403 *.aff* *.dic* *Myspell*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000404You can create a Vim spell file from the .aff and .dic files that Myspell
405uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to
406find them here:
407 http://lingucomponent.openoffice.org/spell_dic.html
Bram Moolenaar30abd282005-06-22 22:35:10 +0000408You can also use a plain word list. The results are the same, the choice
Bram Moolenaard042c562005-06-30 22:04:15 +0000409depends on what word lists you can find.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000410
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000411If you install Aap (from www.a-a-p.org) you can use the recipes in the
412runtime/spell/??/ directories. Aap will take care of downloading the files,
413apply patches needed for Vim and build the .spl file.
414
Bram Moolenaare13305e2005-06-19 22:54:15 +0000415Make sure your current locale is set properly, otherwise Vim doesn't know what
416characters are upper/lower case letters. If the locale isn't available (e.g.,
417when using an MS-Windows codepage on Unix) add tables to the .aff file
Bram Moolenaar3b506942005-06-23 22:36:45 +0000418|spell-affix-chars|. If the .aff file doesn't define a table then the word
419table of the currently active spelling is used. If spelling is not active
420then Vim will try to guess.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000421
Bram Moolenaar3b506942005-06-23 22:36:45 +0000422 *:mksp* *:mkspell*
423:mksp[ell][!] [-ascii] {outname} {inname} ...
Bram Moolenaar16d8f872005-11-26 23:46:11 +0000424 Generate a Vim spell file from word lists. Example: >
Bram Moolenaard042c562005-06-30 22:04:15 +0000425 :mkspell /tmp/nl nl_NL.words
Bram Moolenaar3b506942005-06-23 22:36:45 +0000426< *E751*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000427 When {outname} ends in ".spl" it is used as the output
428 file name. Otherwise it should be a language name,
Bram Moolenaar3b506942005-06-23 22:36:45 +0000429 such as "en", without the region name. The file
430 written will be "{outname}.{encoding}.spl", where
431 {encoding} is the value of the 'encoding' option.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000432
Bram Moolenaard042c562005-06-30 22:04:15 +0000433 When the output file already exists [!] must be used
Bram Moolenaar520470a2005-06-16 21:59:56 +0000434 to overwrite it.
435
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000436 When the [-ascii] argument is present, words with
437 non-ascii characters are skipped. The resulting file
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000438 ends in "ascii.spl".
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000439
440 The input can be the Myspell format files {inname}.aff
441 and {inname}.dic. If {inname}.aff does not exist then
442 {inname} is used as the file name of a plain word
443 list.
444
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000445 Multiple {inname} arguments can be given to combine
446 regions into one Vim spell file. Example: >
447 :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
448< This combines the English word lists for US, CA and AU
449 into one en.spl file.
450 Up to eight regions can be combined. *E754* *755*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000451 The REP and SAL items of the first .aff file where
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000452 they appear are used. |spell-REP| |spell-SAL|
Bram Moolenaar217ad922005-03-20 22:37:15 +0000453
Bram Moolenaar30abd282005-06-22 22:35:10 +0000454 This command uses a lot of memory, required to find
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000455 the optimal word tree (Polish, Italian and Hungarian
456 require several hundred Mbyte). The final result will
457 be much smaller, because compression is used. To
458 avoid running out of memory compression will be done
459 now and then. This can be tuned with the 'mkspellmem'
460 option.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000461
Bram Moolenaard042c562005-06-30 22:04:15 +0000462 After the spell file was written and it was being used
463 in a buffer it will be reloaded automatically.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000464
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000465:mksp[ell] [-ascii] {name}.{enc}.add
466 Like ":mkspell" above, using {name}.{enc}.add as the
Bram Moolenaard042c562005-06-30 22:04:15 +0000467 input file and producing an output file in the same
468 directory that has ".spl" appended.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000469
470:mksp[ell] [-ascii] {name}
471 Like ":mkspell" above, using {name} as the input file
Bram Moolenaard042c562005-06-30 22:04:15 +0000472 and producing an output file in the same directory
473 that has ".{enc}.spl" appended.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000474
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000475Vim will report the number of duplicate words. This might be a mistake in the
476list of words. But sometimes it is used to have different prefixes and
477suffixes for the same basic word to avoid them combining (e.g. Czech uses
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000478this). If you want Vim to report all duplicate words set the 'verbose'
479option.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000480
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000481Since you might want to change a Myspell word list for use with Vim the
482following procedure is recommended:
Bram Moolenaar217ad922005-03-20 22:37:15 +0000483
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00004841. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
4852. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
4863. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000487 words, define word characters with FOL/LOW/UPP, etc. The distributed
488 "src/spell/*.diff" files can be used.
Bram Moolenaard042c562005-06-30 22:04:15 +00004894. Start Vim with the right locale and use |:mkspell| to generate the Vim
490 spell file.
4915. Try out the spell file with ":set spell spelllang=xx" if you wrote it in
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000492 a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you
Bram Moolenaard042c562005-06-30 22:04:15 +0000493 wrote it somewhere else.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000494
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000495When the Myspell files are updated you can merge the differences:
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004961. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
4972. Use Vimdiff to see what changed: >
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000498 vimdiff xx_YY.orig.dic xx_YY.new.dic
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004993. Take over the changes you like in xx_YY.dic.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000500 You may also need to change xx_YY.aff.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00005014. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000502
Bram Moolenaar3b506942005-06-23 22:36:45 +0000503
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000504SPELL FILE VERSIONS *E770* *E771* *E772*
505
506Spell checking is a relatively new feature in Vim, thus it's possible that the
507.spl file format will be changed to support more languages. Vim will check
508the validity of the spell file and report anything wrong.
509
510 E771: Old spell file, needs to be updated ~
511This spell file is older than your Vim. You need to update the .spl file.
512
513 E772: Spell file is for newer version of Vim ~
514This means the spell file was made for a later version of Vim. You need to
515update Vim.
516
517 E770: Unsupported section in spell file ~
518This means the spell file was made for a later version of Vim and contains a
519section that is required for the spell file to work. In this case it's
520probably a good idea to upgrade your Vim.
521
522
Bram Moolenaar3b506942005-06-23 22:36:45 +0000523SPELL FILE DUMP
524
525If for some reason you want to check what words are supported by the currently
526used spelling files, use this command:
527
528 *:spelldump* *:spelld*
529:spelld[ump] Open a new window and fill it with all currently valid
Bram Moolenaarac6e65f2005-08-29 22:25:38 +0000530 words. Compound words are not included.
Bram Moolenaard042c562005-06-30 22:04:15 +0000531 Note: For some languages the result may be enormous,
532 causing Vim to run out of memory.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000533
534The format of the word list is used |spell-wordlist-format|. You should be
535able to read it with ":mkspell" to generate one .spl file that includes all
536the words.
537
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000538When all entries to 'spelllang' use the same regions or no regions at all then
539the region information is included in the dumped words. Otherwise only words
540for the current region are included and no "/regions" line is generated.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000541
Bram Moolenaard042c562005-06-30 22:04:15 +0000542Comment lines with the name of the .spl file are used as a header above the
543words that were generated from that .spl file.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000544
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000545==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00005464. Spell file format *spell-file-format*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000547
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000548This is the format of the files that are used by the person who creates and
549maintains a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000550
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000551Note that we avoid the word "dictionary" here. That is because the goal of
552spell checking differs from writing a dictionary (as in the book). For
Bram Moolenaar16d8f872005-11-26 23:46:11 +0000553spelling we need a list of words that are OK, thus should not be highlighted.
554Person and company names will not appear in a dictionary, but do appear in a
555word list. And some old words are rarely used while they are common
556misspellings. These do appear in a dictionary but not in a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000557
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000558There are two formats: A straight list of words and a list using affix
Bram Moolenaard042c562005-06-30 22:04:15 +0000559compression. The files with affix compression are used by Myspell (Mozilla
560and OpenOffice.org). This requires two files, one with .aff and one with .dic
561extension.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000562
563
Bram Moolenaard042c562005-06-30 22:04:15 +0000564FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format*
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000565
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000566The words must appear one per line. That is all that is required.
Bram Moolenaard042c562005-06-30 22:04:15 +0000567
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000568Additionally the following items are recognized:
Bram Moolenaard042c562005-06-30 22:04:15 +0000569
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000570- Empty and blank lines are ignored.
Bram Moolenaard042c562005-06-30 22:04:15 +0000571
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000572- Lines starting with a # are ignored (comment lines).
Bram Moolenaard042c562005-06-30 22:04:15 +0000573
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000574- A line starting with "/encoding=", before any word, specifies the encoding
575 of the file. After the second '=' comes an encoding name. This tells Vim
Bram Moolenaard042c562005-06-30 22:04:15 +0000576 to setup conversion from the specified encoding to 'encoding'. Thus you can
577 use one word list for several target encodings.
578
Bram Moolenaar3638c682005-06-08 22:05:14 +0000579- A line starting with "/regions=" specifies the region names that are
580 supported. Each region name must be two ASCII letters. The first one is
581 region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca".
Bram Moolenaard042c562005-06-30 22:04:15 +0000582 In an addition word list the region names should be equal to the main word
583 list!
584
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000585- Other lines starting with '/' are reserved for future use. The ones that
586 are not recognized are ignored (but you do get a warning message).
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000587
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000588- A "/" may follow the word with the following items:
589 = Case must match exactly.
590 ? Rare word.
591 ! Bad (wrong) word.
592 digit A region in which the word is valid. If no regions are
593 specified the word is valid in all regions.
594
Bram Moolenaar3638c682005-06-08 22:05:14 +0000595Example:
596
597 # This is an example word list comment
598 /encoding=latin1 encoding of the file
599 /regions=uscagb regions "us", "ca" and "gb"
600 example word for all regions
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000601 blah/12 word for regions "us" and "ca"
602 vim/! bad word
603 Campbell/?3 rare word in region 3 "gb"
604 's mornings/= keep-case word
Bram Moolenaar3638c682005-06-08 22:05:14 +0000605
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000606Note that when "/=" is used the same word with all upper-case letters is not
607accepted. This is different from a word with mixed case that is automatically
608marked as keep-case, those words may appear in all upper-case letters.
609
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000610
611FORMAT WITH AFFIX COMPRESSION
612
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000613There are two files: the basic word list and an affix file. The affixes are
614used to modify the basic words to get the full word list. This significantly
615reduces the number of words, especially for a language like Polish. This is
616called affix compression.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000617
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000618The basic word list and the affix file are combined and turned into a binary
619spell file. All the preprocessing has been done, thus this file loads fast.
620The binary spell file format is described in the source code (src/spell.c).
621But only developers need to know about it.
622
623The preprocessing also allows us to take the Myspell language files and modify
624them before the Vim word list is made. The tools for this can be found in the
625"src/spell" directory.
626
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000627The format for the affix and word list files is based on what Myspell uses
628(the spell checker of Mozilla and OpenOffice.org). A description can be found
629here:
630 http://lingucomponent.openoffice.org/affix.readme ~
631Note that affixes are case sensitive, this isn't obvious from the description.
632
633Vim does not use the TRY item, it is ignored. For making suggestions the
634possible characters in the words are used.
635
636Vim supports quite a few extras. They are described below |spell-affix-vim|.
637Attempts have been made to keep this compatible with other spell checkers, so
638that the same files can be used.
639
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000640
Bram Moolenaar3638c682005-06-08 22:05:14 +0000641WORD LIST FORMAT *spell-dic-format*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000642
643A very short example, with line numbers:
644
645 1 1234
646 2 aan
647 3 Als
648 4 Etten-Leur
649 5 et al.
650 6 's-Gravenhage
651 7 's-Gravenhaags
652 8 bedel/P
653 9 kado/1
654 10 cadeau/2
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000655 11 TCP,IP
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000656
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000657The first line contains the number of words. Vim ignores it, but you do get
658an error message if it's not there. *E760*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000659
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000660What follows is one word per line. There should be no white space before or
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000661after the word. After the word there is an optional slash and flags. Most of
662these flags are letters that indicate the affixes that can be used with this
663word. These are specified with SFX and PFX lines in the .aff file. See the
664Myspell documentation. Vim allows using other flag types with the FLAG item
665in the affix file |spell-FLAG|.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000666
667When the word only has lower-case letters it will also match with the word
668starting with an upper-case letter.
669
670When the word includes an upper-case letter, this means the upper-case letter
671is required at this position. The same word with a lower-case letter at this
672position will not match. When some of the other letters are upper-case it will
673not match either.
674
Bram Moolenaard042c562005-06-30 22:04:15 +0000675The word with all upper-case characters will always be OK.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000676
677 word list matches does not match ~
678 als als Als ALS ALs AlS aLs aLS
679 Als Als ALS als ALs AlS aLs aLS
680 ALS ALS als Als ALs AlS aLs aLS
681 AlS AlS ALS als Als ALs aLs aLS
682
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000683The KEP affix ID can be used to specifically match a word with identical case
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000684only, see below |spell-KEP|.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000685
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000686Note in line 5 to 7 that non-word characters are used. You can include
687any character in a word. When checking the text a word still only matches
688when it appears with a non-word character before and after it. For Myspell a
689word starting with a non-word character probably won't work.
690
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000691In line 12 the word "TCP/IP" is defined. Since the slash has a special
692meaning the comma is used instead. This is defined with the SLASH item in the
693affix file, see |spell-SLASH|. Note that without this SLASH item the
694word will be "TCP,IP".
695
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000696 *spell-affix-vim*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000697A flag that Vim adds and is not in Myspell is the flag defined with KEP in the
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000698affix file. This has the meaning that case matters. This can be used if the
699word does not have the first letter in upper case at the start of a sentence.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000700Example (assuming that = was used for KEP):
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000701
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000702 word list matches does not match ~
703 's morgens/= 's morgens 'S morgens 's Morgens 'S MORGENS
704 's Morgens 's Morgens 'S MORGENS 'S morgens 's morgens
705
706The flag can also be used to avoid that the word matches when it is in all
707upper-case letters.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000708
709 *spell-affix-mbyte*
710The basic word list is normally in an 8-bit encoding, which is mentioned in
711the affix file. The affix file must always be in the same encoding as the
712word list. This is compatible with Myspell. For Vim the encoding may also be
713something else, any encoding that "iconv" supports. The "SET" line must
714specify the name of the encoding. When using a multi-byte encoding it's
Bram Moolenaard042c562005-06-30 22:04:15 +0000715possible to use more different affixes (but Myspell doesn't support that, thus
716you may not want to use it anyway).
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000717
Bram Moolenaare13305e2005-06-19 22:54:15 +0000718
719CHARACTER TABLES
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000720 *spell-affix-chars*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000721When using an 8-bit encoding the affix file should define what characters are
722word characters (as specified with ENC). This is because the system where
723":mkspell" is used may not support a locale with this encoding and isalpha()
724won't work. For example when using "cp1250" on Unix.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000725
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000726 *E761* *E762* *spell-FOL*
727 *spell-LOW* *spell-UPP*
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000728Three lines in the affix file are needed. Simplistic example:
729
Bram Moolenaare13305e2005-06-19 22:54:15 +0000730 FOL áëñ ~
731 LOW áëñ ~
732 UPP ÁËÑ ~
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000733
734All three lines must have exactly the same number of characters.
735
736The "FOL" line specifies the case-folded characters. These are used to
737compare words while ignoring case. For most encodings this is identical to
738the lower case line.
739
740The "LOW" line specifies the characters in lower-case. Mostly it's equal to
741the "FOL" line.
742
743The "UPP" line specifies the characters with upper-case. That is, a character
744is upper-case where it's different from the character at the same position in
745"FOL".
746
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000747An exception is made for the German sharp s ß. The upper-case version is
748"SS". In the FOL/LOW/UPP lines it should be included, so that it's recognized
749as a word character, but use the ß character in all three.
750
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000751ASCII characters should be omitted, Vim always handles these in the same way.
752When the encoding is UTF-8 no word characters need to be specified.
753
754 *E763*
Bram Moolenaar3b506942005-06-23 22:36:45 +0000755Vim allows you to use spell checking for several languages in the same file.
756You can list them in the 'spelllang' option. As a consequence all spell files
757for the same encoding must use the same word characters, otherwise they can't
758be combined without errors. If you get a warning that the word tables differ
759you may need to generate the .spl file again with |:mkspell|. Check the FOL,
760LOW and UPP lines in the used .aff file.
761
762The XX.ascii.spl spell file generated with the "-ascii" argument will not
763contain the table with characters, so that it can be combine with spell files
764for any encoding. The .add.spl files also do not contain the table.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000765
Bram Moolenaare7566042005-06-17 22:00:15 +0000766
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000767MID-WORD CHARACTERS
768 *spell-midword*
769Some characters are only to be considered word characters if they are used in
770between two ordinary word characters. An example is the single quote: It is
771often used to put text in quotes, thus it can't be recognized as a word
772character, but when it appears in between word characters it must be part of
773the word. This is needed to detect a spelling error such as they'are. That
774should be they're, but since "they" and "are" are words themselves that would
775go unnoticed.
776
777These characters are defined with MIDWORD in the .aff file:
778
779 MIDWORD '- ~
780
781
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000782FLAG TYPES *spell-FLAG*
783
784Flags are used to specify the affixes that can be used with a word and for
785other properties of the word. Normally single-character flags are used. This
786limits the number of possible flags, especially for 8-bit encodings. The FLAG
787item can be used if more affixes are to be used. Possible values:
788
789 FLAG long use two-character flags
790 FLAG num use numbers, from 1 up to 65000
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000791 FLAG caplong use one-character flags without A-Z and two-character
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000792 flags that start with A-Z
793
794With "FLAG num" the numbers in a list of affixes need to be separated with a
795comma: "234,2143,1435". This method is inefficient, but useful if the file is
796generated with a program.
797
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000798When using "caplong" the two-character flags all start with a capital: "Aa",
799"B1", "BB", etc. This is useful to use one-character flags for the most
800common items and two-character flags for uncommon items.
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000801
802Note: When using utf-8 only characters up to 65000 may be used for flags.
803
804
Bram Moolenaare13305e2005-06-19 22:54:15 +0000805AFFIXES
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000806 *spell-PFX* *spell-SFX*
Bram Moolenaare13305e2005-06-19 22:54:15 +0000807The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000808documentation or the Aspell manual:
809http://aspell.net/man-html/Affix-Compression.html).
Bram Moolenaare13305e2005-06-19 22:54:15 +0000810
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000811Note that Myspell ignores any extra text after the relevant info. Vim
812requires this text to start with a "#" so that mistakes don't go unnoticed.
813Example:
814
815 SFX F 0 in [^i]n # Spion > Spionin ~
816 SFX F 0 nen in # Bauerin > Bauerinnen ~
817
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000818Apparently Myspell allows an affix name to appear more than once. Since this
819might also be a mistake, Vim checks for an extra "S". The affix files for
820Myspell that use this feature apparently have this flag. Example:
821
822 SFX a Y 1 S ~
823 SFX a 0 an . ~
824
825 SFX a Y 2 S ~
826 SFX a 0 en . ~
827 SFX a 0 on . ~
828
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000829 *spell-affix-rare*
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000830An extra item for Vim is the "rare" flag. It must come after the other
831fields, before a comment. When used then all words that use the affix will be
832marked as rare words. Example:
833
834 PFX F 0 nene . rare ~
835 SFX F 0 oin n rare # hardly ever used ~
836
837However, if the word also appears as a good word in another way it won't be
838marked as rare.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000839
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000840 *spell-affix-nocomp*
841Another extra item for Vim is the "nocomp" flag. It must come after the other
Bram Moolenaar90915b52005-08-21 22:17:52 +0000842fields, before a comment. It can be either before or after "rare". When
843present then all words that use the affix will not be part of a compound word.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000844Example:
845 affix file:
846 COMPOUNDFLAG c ~
847 SFX a Y 2 ~
848 SFX a 0 s . ~
849 SFX a 0 ize . nocomp ~
850 dictionary:
851 word/c ~
852 util/ac ~
853
854This allows for "wordutil" and "wordutils" but not "wordutilize".
855
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000856 *spell-PFXPOSTPONE*
Bram Moolenaare13305e2005-06-19 22:54:15 +0000857When an affix file has very many prefixes that apply to many words it's not
858possible to build the whole word list in memory. This applies to Hebrew (a
859list with all words is over a Gbyte). In that case applying prefixes must be
860postponed. This makes spell checking slower. It is indicated by this keyword
861in the .aff file:
862
863 PFXPOSTPONE ~
864
865Only prefixes without a chop string can be postponed, prefixes with a chop
Bram Moolenaar78984f52005-08-01 07:19:10 +0000866string will still be included in the word list. An exception if the chop
867string is one character and equal to the last character of the added string,
868but in lower case. Thus when the chop string is used to allow the following
869word to start with an upper case letter.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000870
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000871
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000872WORDS WITH A SLASH *spell-SLASH*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000873
874The slash is used in the .dic file to separate the basic word from the affix
875letters that can be used. Unfortunately, this means you cannot use a slash in
876a word. Thus "TCP/IP" cannot be a word. To work around that you can define a
877replacement character for the slash. Example:
878
879 SLASH , ~
880
881Now you can use "TCP,IP" to add the word "TCP/IP".
882
883Of course, the letter used should itself not appear in any word! The letter
884must be ASCII, thus a single byte.
885
886
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000887KEEP-CASE WORDS *spell-KEP*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000888
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000889In the affix file a KEP line can be used to define the affix name used for
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000890keep-case words. Example:
891
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000892 KEP = ~
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000893
894See above for an example |spell-affix-vim|.
895
Bram Moolenaare13305e2005-06-19 22:54:15 +0000896
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000897RARE WORDS *spell-RAR*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000898
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000899In the affix file a RAR line can be used to define the affix name used for
900rare words. Example:
901
902 RAR ? ~
903
904Rare words are highlighted differently from bad words. This is to be used for
905words that are correct for the language, but are hardly ever used and could be
Bram Moolenaar30abd282005-06-22 22:35:10 +0000906a typing mistake anyway. When the same word is found as good it won't be
907highlighted as rare.
908
909
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000910BAD WORDS *spell-BAD*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000911
Bram Moolenaar30abd282005-06-22 22:35:10 +0000912In the affix file a BAD line can be used to define the affix name used for
913bad words. Example:
914
915 BAD ! ~
916
917This can be used to exclude words that would otherwise be good. For example
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000918"the the" in the .dic file:
919
920 the the/! ~
921
922Once a word has been marked as bad it won't be undone by encountering the same
923word as good.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000924
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000925 *spell-NEEDAFFIX*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000926The NEEDAFFIX flag is used to require that a word is used with an affix. The
927word itself is not a good word. Example:
928
929 NEEDAFFIX + ~
930
Bram Moolenaarac6e65f2005-08-29 22:25:38 +0000931 *spell-NEEDCOMPOUND*
932The NEEDCOMPOUND flag is used to require that a word is used as part of a
933compound word The word itself is not a good word. Example:
934
935 NEEDCOMPOUND & ~
936
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000937
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000938COMPOUND WORDS *spell-compound*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000939
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000940A compound word is a longer word made by concatenating words that appear in
941the .dic file. To specify which words may be concatenated a character is
942used. This character is put in the list of affixes after the word. We will
943call this character a flag here. Obviously these flags must be different from
944any affix IDs used.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000945
946 *spell-COMPOUNDFLAG*
947The Myspell compatible method uses one flag, specified with COMPOUNDFLAG.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000948All words with this flag combine in any order. This means there is no control
949over which word comes first. Example:
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000950 COMPOUNDFLAG c ~
951
952 *spell-COMPOUNDFLAGS*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000953A more advanced method to specify how compound words can be formed uses
954multiple items with multiple flags. This is not compatible with Myspell 3.0.
955Let's start with an example:
956 COMPOUNDFLAGS c+ ~
957 COMPOUNDFLAGS se ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000958
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000959The first line defines that words with the "c" flag can be concatenated in any
960order. The second line defines compound words that are made of one word with
961the "s" flag and one word with the "e" flag. With this dictionary:
962 bork/c ~
963 onion/s ~
964 soup/e ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000965
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000966You can make these words:
967 bork
968 borkbork
969 borkborkbork
970 (etc.)
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000971 onion
972 soup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000973 onionsoup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000974
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000975The COMPOUNDFLAGS item may appear multiple times. The argument is made out of
976one or more groups, where each group can be:
977 one flag e.g., c
978 alternate flags inside [] e.g., [abc]
979Optionally this may be followed by:
980 * the group appears zero or more times, e.g., sm*e
981 + the group appears one or more times, e.g., c+
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000982
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000983This is similar to the regexp pattern syntax (but not the same!). A few
984examples with the sequence of word flags they require:
985 COMPOUNDFLAGS x+ x xx xxx etc.
986 COMPOUNDFLAGS yz yz
987 COMPOUNDFLAGS x+z xz xxz xxxz etc.
988 COMPOUNDFLAGS yx+ yx yxx yxxx etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000989
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000990 COMPOUNDFLAGS [abc]z az bz cz
991 COMPOUNDFLAGS [abc]+z az aaz abaz bz baz bcbz cz caz cbaz etc.
992 COMPOUNDFLAGS a[xyz]+ ax axx axyz ay ayx ayzz az azy azxy etc.
993 COMPOUNDFLAGS sm*e se sme smme smmme etc.
994 COMPOUNDFLAGS s[xyz]*e se sxe sxye sxyxe sye syze sze szye szyxe etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000995
Bram Moolenaara6c840d2005-08-22 22:59:46 +0000996A specific example: Allow a compound to be made of two words and a dash:
997 In the .aff file:
998 COMPOUNDFLAGS sde ~
999 NEEDAFFIX x ~
1000 COMPOUNDMAX 3 ~
1001 COMPOUNDMIN 1 ~
1002 In the .dic file:
1003 start/s ~
1004 end/e ~
1005 -/xd ~
1006
1007This allows for the word "start-end", but not "startend".
1008
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001009 *spell-COMPOUNDMIN*
Bram Moolenaarac6e65f2005-08-29 22:25:38 +00001010The minimal character length of a word used for compounding is specified with
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001011COMPOUNDMIN. Example:
1012 COMPOUNDMIN 5 ~
1013
Bram Moolenaarac6e65f2005-08-29 22:25:38 +00001014When omitted there is no minimal length. Obviously you could just leave out
1015the compound flag from short words instead, this feature is present for
1016compatibility with Myspell.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001017
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001018 *spell-COMPOUNDMAX*
1019The maximum number of words that can be concatenated into a compound word is
1020specified with COMPOUNDMAX. Example:
1021 COMPOUNDMAX 3 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001022
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001023When omitted there is no maximum. It applies to all compound words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001024
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001025To set a limit for words with specific flags make sure the items in
1026COMPOUNDFLAGS where they appear don't allow too many words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001027
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001028 *spell-COMPOUNDSYLMAX*
1029The maximum number of syllables that a compound word may contain is specified
1030with COMPOUNDSYLMAX. Example:
1031 COMPOUNDSYLMAX 6 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001032
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001033This has no effect if there is no SYLLABLE item. Without COMPOUNDSYLMAX there
1034is no limit on the number of syllables.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001035
Bram Moolenaara6c840d2005-08-22 22:59:46 +00001036If both COMPOUNDMAX and COMPOUNDSYLMAX are defined, a compound word is
1037accepted if it fits one of the criteria, thus is either made from up to
1038COMPOUNDMAX words or contains up to COMPOUNDSYLMAX syllables.
1039
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001040 *spell-SYLLABLE*
1041The SYLLABLE item defines characters or character sequences that are used to
1042count the number of syllables in a word. Example:
1043 SYLLABLE aáeéiíoóöõuúüûy/aa/au/ea/ee/ei/ie/oa/oe/oo/ou/uu/ui ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001044
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001045Before the first slash is the set of characters that are counted for one
1046syllable, also when repeated and mixed, until the next character that is not
1047in this set. After the slash come sequences of characters that are counted
1048for one syllable. These are preferred over using characters from the set.
1049With the example "ideeen" has three syllables, counted by "i", "ee" and "e".
1050
1051Only case-folded letters need to be included.
1052
1053Above another way to restrict compounding was mentioned above: adding "nocomp"
1054after an affix causes all words that are made with that affix not be be used
1055for compounding. |spell-affix-nocomp|
1056
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001057
1058UNLIMITED COMPOUNDING *spell-NOBREAK*
1059
1060For some languages, such as Thai, there is no space in between words. This
1061looks like all words are compounded. To specify this use the NOBREAK item in
1062the affix file, without arguments:
1063 NOBREAK ~
1064
1065Vim will try to figure out where one word ends and a next starts. When there
1066are spelling mistakes this may not be quite right.
1067
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001068>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
1069NOTE: The following has not been implemented yet, because there are no word
1070lists that support this.
1071> *spell-CMP*
1072> Sometimes it is necessary to change a word when concatenating it to another,
1073> by removing a few letters, inserting something or both. It can also be useful
1074> to restrict concatenation to words that match a pattern. For this purpose CMP
1075> items can be used. They look like this:
1076> CMP {flag} {flags} {strip} {strip2} {add} {cond} {cond2}
1077>
1078> {flag} the flag, as used in COMPOUNDFLAGS for the lead word
1079> {flags} accepted flags for the following word ('.' to accept
1080> all)
1081> {strip} text to remove from the end of the lead word (zero
1082> for no stripping)
1083> {strip2} text to remove from the start of the following word
1084> (zero for no stripping)
1085> {add} text to insert between the words (zero for no
1086> addition)
1087> {cond} condition to match at the end of the lead word
1088> {cond2} condition to match at the start of the following word
1089>
1090> This is the same as what is used for SFX and PFX items, with the extra {flags}
1091> and {cond2} fields. Example:
1092> CMP f mrt 0 - . . ~
1093>
1094> When used with the food and dish word list above, this means that a dash is
1095> inserted after each food item. Thus you get "onion-soup" and
1096> "onion-tomato-salat".
1097>
1098> When there are CMP items for a compound flag the concatenation is only done
1099> when a CMP item matches.
1100>
1101> When there are no CMP items for a compound flag, then all words will be
1102> concatenated, as if there was an item:
1103> CMP {flag} . 0 0 . .
1104>
1105>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001106
1107
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001108REPLACEMENTS *spell-REP*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001109
1110In the affix file REP items can be used to define common mistakes. This is
1111used to make spelling suggestions. The items define the "from" text and the
1112"to" replacement. Example:
1113
1114 REP 4 ~
1115 REP f ph ~
1116 REP ph f ~
1117 REP k ch ~
1118 REP ch k ~
1119
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001120The first line specifies the number of REP lines following. Vim ignores the
1121number, but it must be there.
1122
Bram Moolenaard042c562005-06-30 22:04:15 +00001123Don't include simple one-character replacements or swaps. Vim will try these
1124anyway. You can include whole words if you want to, but you might want to use
1125the "file:" item in 'spellsuggest' instead.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001126
Bram Moolenaar1e015462005-09-25 22:16:38 +00001127You can include a space by using an underscore:
1128
1129 REP the_the the ~
1130
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001131
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001132SIMILAR CHARACTERS *spell-MAP*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001133
Bram Moolenaard042c562005-06-30 22:04:15 +00001134In the affix file MAP items can be used to define letters that are very much
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001135alike. This is mostly used for a letter with different accents. This is used
1136to prefer suggestions with these letters substituted. Example:
1137
1138 MAP 2 ~
1139 MAP eéëêè ~
1140 MAP uüùúû ~
1141
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001142The first line specifies the number of MAP lines following. Vim ignores the
1143number, but the line must be there.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001144
Bram Moolenaard042c562005-06-30 22:04:15 +00001145Each letter must appear in only one of the MAP items. It's a bit more
1146efficient if the first letter is ASCII or at least one without accents.
Bram Moolenaare7566042005-06-17 22:00:15 +00001147
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001148
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001149SOUND-A-LIKE *spell-SAL*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001150
1151In the affix file SAL items can be used to define the sounds-a-like mechanism
1152to be used. The main items define the "from" text and the "to" replacement.
Bram Moolenaard042c562005-06-30 22:04:15 +00001153Simplistic example:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001154
1155 SAL CIA X ~
1156 SAL CH X ~
1157 SAL C K ~
1158 SAL K K ~
1159
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001160There are a few rules and this can become quite complicated. An explanation
Bram Moolenaard042c562005-06-30 22:04:15 +00001161how it works can be found in the Aspell manual:
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001162http://aspell.net/man-html/Phonetic-Code.html.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001163
1164There are a few special items:
1165
1166 SAL followup true ~
1167 SAL collapse_result true ~
1168 SAL remove_accents true ~
1169
1170"1" has the same meaning as "true". Any other value means "false".
1171
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001172
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001173SIMPLE SOUNDFOLDING *spell-SOFOFROM* *spell-SOFOTO*
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001174
1175The SAL mechanism is complex and slow. A simpler mechanism is mapping all
1176characters to another character, mapping similar sounding characters to the
1177same character. At the same time this does case folding. You can not have
Bram Moolenaard042c562005-06-30 22:04:15 +00001178both SAL items and simple soundfolding.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001179
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001180There are two items required: one to specify the characters that are mapped
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001181and one that specifies the characters they are mapped to. They must have
1182exactly the same number of characters. Example:
1183
1184 SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~
1185 SOFOTO ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~
1186
1187In the example all vowels are mapped to the same character 'e'. Another
Bram Moolenaard042c562005-06-30 22:04:15 +00001188method would be to leave out all vowels. Some characters that sound nearly
1189the same and are often mixed up, such as 'm' and 'n', are mapped to the same
1190character. Don't do this too much, all words will start looking alike.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001191
1192Characters that do not appear in SOFOFROM will be left out, except that all
1193white space is replaced by one space. Sequences of the same character in
1194SOFOFROM are replaced by one.
1195
1196You can use the |soundfold()| function to try out the results. Or set the
Bram Moolenaarcc016f52005-12-10 20:23:46 +00001197'verbose' option to see the score in the output of the |z=| command.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001198
1199
Bram Moolenaar217ad922005-03-20 22:37:15 +00001200 vim:tw=78:sw=4:ts=8:ft=help:norl: