blob: a2989d78306cd30d9240db5bb7b75a3b492bdb9b [file] [log] [blame]
Bram Moolenaar4770d092006-01-12 23:22:24 +00001*spell.txt* For Vim version 7.0aa. Last change: 2006 Jan 11
Bram Moolenaar217ad922005-03-20 22:37:15 +00002
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7Spell checking *spell*
8
91. Quick start |spell-quickstart|
Bram Moolenaard042c562005-06-30 22:04:15 +0000102. Remarks on spell checking |spell-remarks|
113. Generating a spell file |spell-mkspell|
124. Spell file format |spell-file-format|
Bram Moolenaar217ad922005-03-20 22:37:15 +000013
14{Vi does not have any of these commands}
15
16Spell checking is not available when the |+syntax| feature has been disabled
17at compile time.
18
19==============================================================================
201. Quick start *spell-quickstart*
21
22This command switches on spell checking: >
23
24 :setlocal spell spelllang=en_us
25
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000026This switches on the 'spell' option and specifies to check for US English.
Bram Moolenaar217ad922005-03-20 22:37:15 +000027
28The words that are not recognized are highlighted with one of these:
Bram Moolenaar520470a2005-06-16 21:59:56 +000029 SpellBad word not recognized |hl-SpellBad|
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +000030 SpellCap word not capitalised |hl-SpellCap|
Bram Moolenaar520470a2005-06-16 21:59:56 +000031 SpellRare rare word |hl-SpellRare|
32 SpellLocal wrong spelling for selected region |hl-SpellLocal|
Bram Moolenaar217ad922005-03-20 22:37:15 +000033
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000034Vim only checks words for spelling, there is no grammar check.
35
Bram Moolenaar45360022005-07-21 21:08:21 +000036If the 'mousemodel' option is set to "popup" and the cursor is on a badly
37spelled word or it is "popup_setpos" and the mouse pointer is on a badly
Bram Moolenaar16d8f872005-11-26 23:46:11 +000038spelled word, then the popup menu will contain a submenu to replace the bad
Bram Moolenaar45360022005-07-21 21:08:21 +000039word. Note: this slows down the appearance of the popup menu.
40
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000041To search for the next misspelled word:
42
43 *]s* *E756*
44]s Move to next misspelled word after the cursor.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000045 A count before the command can be used to repeat.
Bram Moolenaarac6e65f2005-08-29 22:25:38 +000046 'wrapscan' applies.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000047
48 *[s*
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000049[s Like "]s" but search backwards, find the misspelled
Bram Moolenaar30abd282005-06-22 22:35:10 +000050 word before the cursor. Doesn't recognize words
51 split over two lines, thus may stop at words that are
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000052 not highlighted as bad. Does not stop at word with
53 missing capital at the start of a line.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000054
55 *]S*
56]S Like "]s" but only stop at bad words, not at rare
57 words or words for another region.
58
59 *[S*
60[S Like "]S" but search backwards.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000061
Bram Moolenaar217ad922005-03-20 22:37:15 +000062
Bram Moolenaarf75a9632005-09-13 21:20:47 +000063To add words to your own word list:
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000064
65 *zg*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000066zg Add word under the cursor as a good word to the first
Bram Moolenaarda2303d2005-08-30 21:55:26 +000067 name in 'spellfile'. A count may precede the command
68 to indicate the entry in 'spellfile' to be used. A
69 count of two uses the second entry.
70
71 In Visual mode the selected characters are added as a
72 word (including white space!).
73 When the cursor is on text that is marked as badly
74 spelled then the marked text is used.
75 Otherwise the word under the cursor, separated by
76 non-word characters, is used.
77
78 If the word is explicitly marked as bad word in
79 another spell file the result is unpredictable.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000080
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000081 *zG*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000082zG Like "zg" but add the word to the internal word list
83 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000084
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000085 *zw*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000086zw Like "zg" but mark the word as a wrong (bad) word.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000087
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000088 *zW*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000089zW Like "zw" but add the word to the internal word list
90 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000091
Bram Moolenaar520470a2005-06-16 21:59:56 +000092 *:spe* *:spellgood*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000093:[count]spe[llgood] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000094 Add {word} as a good word to 'spellfile', like with
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000095 "zg". Without count the first name is used, with a
96 count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000097
Bram Moolenaar53180ce2005-07-05 21:48:14 +000098:spe[llgood]! {word} Add {word} as a good word to the internal word list,
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000099 like with "zG".
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000100
Bram Moolenaar520470a2005-06-16 21:59:56 +0000101 *:spellw* *:spellwrong*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000102:[count]spellw[rong] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000103 Add {word} as a wrong (bad) word to 'spellfile', as
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000104 with "zw". Without count the first name is used, with
105 a count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000106
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000107:spellw[rong]! {word} Add {word} as a wrong (bad) word to the internal word
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000108 list.
109
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000110After adding a word to 'spellfile' with the above commands its associated
Bram Moolenaard042c562005-06-30 22:04:15 +0000111".spl" file will automatically be updated and reloaded. If you change
112'spellfile' manually you need to use the |:mkspell| command. This sequence of
113commands mostly works well: >
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000114 :edit <file in 'spellfile'>
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000115< (make changes to the spell file) >
116 :mkspell! %
117
118More details about the 'spellfile' format below |spell-wordlist-format|.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000119
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000120 *internal-wordlist*
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000121The internal word list is used for all buffers where 'spell' is set. It is
122not stored, it is lost when you exit Vim. It is also cleared when 'encoding'
123is set.
124
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000125
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000126Finding suggestions for bad words:
Bram Moolenaarcc016f52005-12-10 20:23:46 +0000127 *z=*
128z= For the word under/after the cursor suggest correctly
Bram Moolenaard042c562005-06-30 22:04:15 +0000129 spelled words. This also works to find alternatives
130 for a word that is not highlighted as a bad word,
131 e.g., when the word after it is bad.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000132 The results are sorted on similarity to the word
133 under/after the cursor.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000134 This may take a long time. Hit CTRL-C when you get
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000135 bored.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000136
137 If the command is used without a count the
138 alternatives are listed and you can enter the number
139 of your choice or press <Enter> if you don't want to
140 replace. You can also use the mouse to click on your
141 choice (only works if the mouse can be used in Normal
142 mode and when there are no line wraps). Click on the
143 first line (the header) to cancel.
144
145 If a count is used that suggestion is used, without
Bram Moolenaarcc016f52005-12-10 20:23:46 +0000146 prompting. For example, "1z=" always takes the first
Bram Moolenaar90915b52005-08-21 22:17:52 +0000147 suggestion.
148
149 If 'verbose' is non-zero a score will be displayed
150 with the suggestions to indicate the likeliness to the
151 badly spelled word (the higher the score the more
152 different).
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000153 When a word was replaced the redo command "." will
154 repeat the word replacement. This works like "ciw",
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000155 the good word and <Esc>. This does NOT work for Thai
156 and other languages without spaces between words.
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000157
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000158 *:spellr* *:spellrepall* *E752* *E753*
Bram Moolenaarcc016f52005-12-10 20:23:46 +0000159:spellr[epall] Repeat the replacement done by |z=| for all matches
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000160 with the replaced word in the current window.
161
Bram Moolenaar488c6512005-08-11 20:09:58 +0000162In Insert mode, when the cursor is after a badly spelled word, you can use
163CTRL-X s to find suggestions. This works like Insert mode completion. Use
164CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s|
165
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000166The 'spellsuggest' option influences how the list of suggestions is generated
167and sorted. See |'spellsuggest'|.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000168
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000169The 'spellcapcheck' option is used to check the first word of a sentence
170starts with a capital. This doesn't work for the first word in the file.
171When there is a line break right after a sentence the highlighting of the next
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000172line may be postponed. Use |CTRL-L| when needed. Also see |set-spc-auto| for
173how it can be set automatically when 'spelllang' is set.
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000174
Bram Moolenaar4770d092006-01-12 23:22:24 +0000175Vim counts the number of times a good word is encountered. This is used to
176sort the suggestions: words that have been seen before get a small bonus,
177words that have been seen often get a bigger bonus. The COMMON item in the
178affix file can be used to define common words, so that this mechanism also
179works in a new or short file |spell-COMMON|.
180
Bram Moolenaard042c562005-06-30 22:04:15 +0000181==============================================================================
1822. Remarks on spell checking *spell-remarks*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000183
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000184PERFORMANCE
185
Bram Moolenaard042c562005-06-30 22:04:15 +0000186Vim does on-the-fly spell checking. To make this work fast the word list is
187loaded in memory. Thus this uses a lot of memory (1 Mbyte or more). There
188might also be a noticeable delay when the word list is loaded, which happens
189when 'spell' is set and when 'spelllang' is set while 'spell' was already set.
190To minimize the delay each word list is only loaded once, it is not deleted
191when 'spelllang' is made empty or 'spell' is reset. When 'encoding' is set
192all the word lists are reloaded, thus you may notice a delay then too.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000193
194
Bram Moolenaar217ad922005-03-20 22:37:15 +0000195REGIONS
196
197A word may be spelled differently in various regions. For example, English
198comes in (at least) these variants:
199
200 en all regions
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000201 en_au Australia
Bram Moolenaar217ad922005-03-20 22:37:15 +0000202 en_ca Canada
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000203 en_gb Great Britain
204 en_nz New Zealand
205 en_us USA
Bram Moolenaar217ad922005-03-20 22:37:15 +0000206
207Words that are not used in one region but are used in another region are
Bram Moolenaar520470a2005-06-16 21:59:56 +0000208highlighted with SpellLocal |hl-SpellLocal|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000209
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000210Always use lowercase letters for the language and region names.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000211
Bram Moolenaar3638c682005-06-08 22:05:14 +0000212When adding a word with |zg| or another command it's always added for all
213regions. You can change that by manually editing the 'spellfile'. See
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000214|spell-wordlist-format|. Note that the regions as specified in the files in
Bram Moolenaar16d8f872005-11-26 23:46:11 +0000215'spellfile' are only used when all entries in 'spelllang' specify the same
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000216region (not counting files specified by their .spl name).
Bram Moolenaar3638c682005-06-08 22:05:14 +0000217
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000218 *spell-german*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000219Specific exception: For German these special regions are used:
220 de all German words accepted
221 de_de old and new spelling
222 de_19 old spelling
223 de_20 new spelling
224 de_at Austria
225 de_ch Switzerland
226
Bram Moolenaar92d640f2005-09-05 22:11:52 +0000227 *spell-russian*
228Specific exception: For Russian these special regions are used:
229 ru all Russian words accepted
230 ru_ru "IE" letter spelling
231 ru_yo "YO" letter spelling
232
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000233 *spell-yiddish*
234Yiddish requires using "utf-8" encoding, because of the special characters
235used. If you are using latin1 Vim will use transliterated (romanized) Yiddish
236instead. If you want to use transliterated Yiddish with utf-8 use "yi-tr".
237In a table:
238 'encoding' 'spelllang'
239 utf-8 yi Yiddish
240 latin1 yi transliterated Yiddish
241 utf-8 yi-tr transliterated Yiddish
242
Bram Moolenaar217ad922005-03-20 22:37:15 +0000243
Bram Moolenaar3b506942005-06-23 22:36:45 +0000244SPELL FILES *spell-load*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000245
246Vim searches for spell files in the "spell" subdirectory of the directories in
Bram Moolenaar3638c682005-06-08 22:05:14 +0000247'runtimepath'. The name is: LL.EEE.spl, where:
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000248 LL the language name
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000249 EEE the value of 'encoding'
Bram Moolenaar217ad922005-03-20 22:37:15 +0000250
Bram Moolenaar3b506942005-06-23 22:36:45 +0000251The value for "LL" comes from 'spelllang', but excludes the region name.
252Examples:
253 'spelllang' LL ~
254 en_us en
255 en-rare en-rare
256 medical_ca medical
257
Bram Moolenaar3638c682005-06-08 22:05:14 +0000258Only the first file is loaded, the one that is first in 'runtimepath'. If
259this succeeds then additionally files with the name LL.EEE.add.spl are loaded.
260All the ones that are found are used.
261
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000262Additionally, the files related to the names in 'spellfile' are loaded. These
263are the files that |zg| and |zw| add good and wrong words to.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000264
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000265Exceptions:
266- Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
267 matter for spelling.
268- When no spell file for 'encoding' is found "ascii" is tried. This only
269 works for languages where nearly all words are ASCII, such as English. It
270 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
Bram Moolenaar3638c682005-06-08 22:05:14 +0000271 is being edited. For the ".add" files the same name as the found main
272 spell file is used.
273
274For example, with these values:
275 'runtimepath' is "~/.vim,/usr/share/vim70,~/.vim/after"
276 'encoding' is "iso-8859-2"
277 'spelllang' is "pl"
278
279Vim will look for:
2801. ~/.vim/spell/pl.iso-8859-2.spl
2812. /usr/share/vim70/spell/pl.iso-8859-2.spl
2823. ~/.vim/spell/pl.iso-8859-2.add.spl
2834. /usr/share/vim70/spell/pl.iso-8859-2.add.spl
2845. ~/.vim/after/spell/pl.iso-8859-2.add.spl
285
286This assumes 1. is not found and 2. is found.
287
288If 'encoding' is "latin1" Vim will look for:
2891. ~/.vim/spell/pl.latin1.spl
2902. /usr/share/vim70/spell/pl.latin1.spl
2913. ~/.vim/after/spell/pl.latin1.spl
2924. ~/.vim/spell/pl.ascii.spl
2935. /usr/share/vim70/spell/pl.ascii.spl
2946. ~/.vim/after/spell/pl.ascii.spl
295
296This assumes none of them are found (Polish doesn't make sense when leaving
297out the non-ASCII characters).
Bram Moolenaar217ad922005-03-20 22:37:15 +0000298
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000299Spelling for EBCDIC is currently not supported.
300
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000301A spell file might not be available in the current 'encoding'. See
302|spell-mkspell| about how to create a spell file. Converting a spell file
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000303with "iconv" will NOT work!
Bram Moolenaar217ad922005-03-20 22:37:15 +0000304
Bram Moolenaar4770d092006-01-12 23:22:24 +0000305 *spell-sug-file*
306If there is a file with exactly the same name as the ".spl" file but ending in
307".sug", that file will be used for giving better suggestions. It isn't loaded
308before suggestions are made to reduce memory use.
309
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000310 *E758* *E759*
311When loading a spell file Vim checks that it is properly formatted. If you
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000312get an error the file may be truncated, modified or intended for another Vim
313version.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000314
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000315
316WORDS
317
318Vim uses a fixed method to recognize a word. This is independent of
319'iskeyword', so that it also works in help files and for languages that
320include characters like '-' in 'iskeyword'. The word characters do depend on
321'encoding'.
322
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000323The table with word characters is stored in the main .spl file. Therefore it
324matters what the current locale is when generating it! A .add.spl file does
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000325not contain a word table though.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000326
Bram Moolenaar3638c682005-06-08 22:05:14 +0000327A word that starts with a digit is always ignored. That includes hex numbers
328in the form 0xff and 0XFF.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000329
330
Bram Moolenaar30abd282005-06-22 22:35:10 +0000331WORD COMBINATIONS
332
333It is possible to spell-check words that include a space. This is used to
334recognize words that are invalid when used by themselves, e.g. for "et al.".
335It can also be used to recognize "the the" and highlight it.
336
337The number of spaces is irrelevant. In most cases a line break may also
338appear. However, this makes it difficult to find out where to start checking
339for spelling mistakes. When you make a change to one line and only that line
340is redrawn Vim won't look in the previous line, thus when "et" is at the end
341of the previous line "al." will be flagged as an error. And when you type
342"the<CR>the" the highlighting doesn't appear until the first line is redrawn.
343Use |CTRL-L| to redraw right away. "[s" will also stop at a word combination
344with a line break.
345
346When encountering a line break Vim skips characters such as '*', '>' and '"',
347so that comments in C, shell and Vim code can be spell checked.
348
349
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +0000350SYNTAX HIGHLIGHTING *spell-syntax*
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000351
352Files that use syntax highlighting can specify where spell checking should be
353done:
354
Bram Moolenaar3638c682005-06-08 22:05:14 +00003551. everywhere default
3562. in specific items use "contains=@Spell"
3573. everywhere but specific items use "contains=@NoSpell"
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000358
Bram Moolenaar3638c682005-06-08 22:05:14 +0000359For the second method adding the @NoSpell cluster will disable spell checking
360again. This can be used, for example, to add @Spell to the comments of a
361program, and add @NoSpell for items that shouldn't be checked.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000362
Bram Moolenaar30abd282005-06-22 22:35:10 +0000363
364VIM SCRIPTS
365
366If you want to write a Vim script that does something with spelling, you may
367find these functions useful:
368
369 spellbadword() find badly spelled word at the cursor
370 spellsuggest() get list of spelling suggestions
Bram Moolenaard042c562005-06-30 22:04:15 +0000371 soundfold() get the sound-a-like version of a word
Bram Moolenaar30abd282005-06-22 22:35:10 +0000372
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000373
374SETTING 'spellcapcheck' AUTOMATICALLY *set-spc-auto*
375
376After the 'spelllang' option has been set successfully, Vim will source the
377files "spell/LANG.vim" in 'runtimepath'. "LANG" is the value of 'spelllang'
378up to the first comma, dot or underscore. This can be used to set options
379specifically for the language, especially 'spellcapcheck'.
380
381The distribution includes a few of these files. Use this command to see what
382they do: >
383 :next $VIMRUNTIME/spell/*.vim
384
385Note that the default scripts don't set 'spellcapcheck' if it was changed from
386the default value. This assumes the user prefers another value then.
387
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000388
389DOUBLE SCORING *spell-double-scoring*
390
391The 'spellsuggest' option can be used to select "double" scoring. This
392mechanism is based on the principle that there are two kinds of spelling
393mistakes:
394
3951. You know how to spell the word, but mistype something. This results in a
396 small editing distance (character swapped/omitted/inserted) and possibly a
397 word that sounds completely different.
398
3992. You don't know how to spell the word and type something that sounds right.
400 The edit distance can be big but the word is similar after sound-folding.
401
402Since scores for these two mistakes will be very different we use a list
403for each and mix them.
404
405The sound-folding is slow and people that know the language won't make the
406second kind of mistakes. Therefore 'spellsuggest' can be set to select the
407preferred method for scoring the suggestions.
408
Bram Moolenaar217ad922005-03-20 22:37:15 +0000409==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00004103. Generating a spell file *spell-mkspell*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000411
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000412Vim uses a binary file format for spelling. This greatly speeds up loading
413the word list and keeps it small.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000414 *.aff* *.dic* *Myspell*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000415You can create a Vim spell file from the .aff and .dic files that Myspell
416uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to
417find them here:
418 http://lingucomponent.openoffice.org/spell_dic.html
Bram Moolenaar30abd282005-06-22 22:35:10 +0000419You can also use a plain word list. The results are the same, the choice
Bram Moolenaard042c562005-06-30 22:04:15 +0000420depends on what word lists you can find.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000421
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000422If you install Aap (from www.a-a-p.org) you can use the recipes in the
423runtime/spell/??/ directories. Aap will take care of downloading the files,
424apply patches needed for Vim and build the .spl file.
425
Bram Moolenaare13305e2005-06-19 22:54:15 +0000426Make sure your current locale is set properly, otherwise Vim doesn't know what
427characters are upper/lower case letters. If the locale isn't available (e.g.,
428when using an MS-Windows codepage on Unix) add tables to the .aff file
Bram Moolenaar3b506942005-06-23 22:36:45 +0000429|spell-affix-chars|. If the .aff file doesn't define a table then the word
430table of the currently active spelling is used. If spelling is not active
431then Vim will try to guess.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000432
Bram Moolenaar3b506942005-06-23 22:36:45 +0000433 *:mksp* *:mkspell*
434:mksp[ell][!] [-ascii] {outname} {inname} ...
Bram Moolenaar16d8f872005-11-26 23:46:11 +0000435 Generate a Vim spell file from word lists. Example: >
Bram Moolenaard042c562005-06-30 22:04:15 +0000436 :mkspell /tmp/nl nl_NL.words
Bram Moolenaar3b506942005-06-23 22:36:45 +0000437< *E751*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000438 When {outname} ends in ".spl" it is used as the output
439 file name. Otherwise it should be a language name,
Bram Moolenaar3b506942005-06-23 22:36:45 +0000440 such as "en", without the region name. The file
441 written will be "{outname}.{encoding}.spl", where
442 {encoding} is the value of the 'encoding' option.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000443
Bram Moolenaard042c562005-06-30 22:04:15 +0000444 When the output file already exists [!] must be used
Bram Moolenaar520470a2005-06-16 21:59:56 +0000445 to overwrite it.
446
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000447 When the [-ascii] argument is present, words with
448 non-ascii characters are skipped. The resulting file
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000449 ends in "ascii.spl".
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000450
451 The input can be the Myspell format files {inname}.aff
452 and {inname}.dic. If {inname}.aff does not exist then
453 {inname} is used as the file name of a plain word
454 list.
455
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000456 Multiple {inname} arguments can be given to combine
457 regions into one Vim spell file. Example: >
458 :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
459< This combines the English word lists for US, CA and AU
460 into one en.spl file.
461 Up to eight regions can be combined. *E754* *755*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000462 The REP and SAL items of the first .aff file where
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000463 they appear are used. |spell-REP| |spell-SAL|
Bram Moolenaar217ad922005-03-20 22:37:15 +0000464
Bram Moolenaar30abd282005-06-22 22:35:10 +0000465 This command uses a lot of memory, required to find
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000466 the optimal word tree (Polish, Italian and Hungarian
467 require several hundred Mbyte). The final result will
468 be much smaller, because compression is used. To
469 avoid running out of memory compression will be done
470 now and then. This can be tuned with the 'mkspellmem'
471 option.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000472
Bram Moolenaard042c562005-06-30 22:04:15 +0000473 After the spell file was written and it was being used
474 in a buffer it will be reloaded automatically.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000475
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000476:mksp[ell] [-ascii] {name}.{enc}.add
477 Like ":mkspell" above, using {name}.{enc}.add as the
Bram Moolenaard042c562005-06-30 22:04:15 +0000478 input file and producing an output file in the same
479 directory that has ".spl" appended.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000480
481:mksp[ell] [-ascii] {name}
482 Like ":mkspell" above, using {name} as the input file
Bram Moolenaard042c562005-06-30 22:04:15 +0000483 and producing an output file in the same directory
484 that has ".{enc}.spl" appended.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000485
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000486Vim will report the number of duplicate words. This might be a mistake in the
487list of words. But sometimes it is used to have different prefixes and
488suffixes for the same basic word to avoid them combining (e.g. Czech uses
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000489this). If you want Vim to report all duplicate words set the 'verbose'
490option.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000491
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000492Since you might want to change a Myspell word list for use with Vim the
493following procedure is recommended:
Bram Moolenaar217ad922005-03-20 22:37:15 +0000494
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00004951. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
4962. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
4973. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000498 words, define word characters with FOL/LOW/UPP, etc. The distributed
499 "src/spell/*.diff" files can be used.
Bram Moolenaard042c562005-06-30 22:04:15 +00005004. Start Vim with the right locale and use |:mkspell| to generate the Vim
501 spell file.
5025. Try out the spell file with ":set spell spelllang=xx" if you wrote it in
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000503 a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you
Bram Moolenaard042c562005-06-30 22:04:15 +0000504 wrote it somewhere else.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000505
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000506When the Myspell files are updated you can merge the differences:
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00005071. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
5082. Use Vimdiff to see what changed: >
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000509 vimdiff xx_YY.orig.dic xx_YY.new.dic
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00005103. Take over the changes you like in xx_YY.dic.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000511 You may also need to change xx_YY.aff.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00005124. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000513
Bram Moolenaar3b506942005-06-23 22:36:45 +0000514
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000515SPELL FILE VERSIONS *E770* *E771* *E772*
516
517Spell checking is a relatively new feature in Vim, thus it's possible that the
518.spl file format will be changed to support more languages. Vim will check
519the validity of the spell file and report anything wrong.
520
521 E771: Old spell file, needs to be updated ~
522This spell file is older than your Vim. You need to update the .spl file.
523
524 E772: Spell file is for newer version of Vim ~
525This means the spell file was made for a later version of Vim. You need to
526update Vim.
527
528 E770: Unsupported section in spell file ~
529This means the spell file was made for a later version of Vim and contains a
530section that is required for the spell file to work. In this case it's
531probably a good idea to upgrade your Vim.
532
533
Bram Moolenaar3b506942005-06-23 22:36:45 +0000534SPELL FILE DUMP
535
536If for some reason you want to check what words are supported by the currently
537used spelling files, use this command:
538
539 *:spelldump* *:spelld*
540:spelld[ump] Open a new window and fill it with all currently valid
Bram Moolenaarac6e65f2005-08-29 22:25:38 +0000541 words. Compound words are not included.
Bram Moolenaard042c562005-06-30 22:04:15 +0000542 Note: For some languages the result may be enormous,
543 causing Vim to run out of memory.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000544
Bram Moolenaar4770d092006-01-12 23:22:24 +0000545:spelld[ump]! Like ":spelldump" and include the word count. This is
546 the number of times the word was found while
547 updating the screen. Words that are in COMMON items
548 get a starting count of 10.
549
Bram Moolenaar3b506942005-06-23 22:36:45 +0000550The format of the word list is used |spell-wordlist-format|. You should be
551able to read it with ":mkspell" to generate one .spl file that includes all
552the words.
553
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000554When all entries to 'spelllang' use the same regions or no regions at all then
555the region information is included in the dumped words. Otherwise only words
556for the current region are included and no "/regions" line is generated.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000557
Bram Moolenaard042c562005-06-30 22:04:15 +0000558Comment lines with the name of the .spl file are used as a header above the
559words that were generated from that .spl file.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000560
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000561==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00005624. Spell file format *spell-file-format*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000563
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000564This is the format of the files that are used by the person who creates and
565maintains a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000566
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000567Note that we avoid the word "dictionary" here. That is because the goal of
568spell checking differs from writing a dictionary (as in the book). For
Bram Moolenaar16d8f872005-11-26 23:46:11 +0000569spelling we need a list of words that are OK, thus should not be highlighted.
570Person and company names will not appear in a dictionary, but do appear in a
571word list. And some old words are rarely used while they are common
572misspellings. These do appear in a dictionary but not in a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000573
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000574There are two formats: A straight list of words and a list using affix
Bram Moolenaard042c562005-06-30 22:04:15 +0000575compression. The files with affix compression are used by Myspell (Mozilla
576and OpenOffice.org). This requires two files, one with .aff and one with .dic
577extension.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000578
579
Bram Moolenaard042c562005-06-30 22:04:15 +0000580FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format*
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000581
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000582The words must appear one per line. That is all that is required.
Bram Moolenaard042c562005-06-30 22:04:15 +0000583
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000584Additionally the following items are recognized:
Bram Moolenaard042c562005-06-30 22:04:15 +0000585
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000586- Empty and blank lines are ignored.
Bram Moolenaard042c562005-06-30 22:04:15 +0000587
Bram Moolenaar4770d092006-01-12 23:22:24 +0000588 # comment ~
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000589- Lines starting with a # are ignored (comment lines).
Bram Moolenaard042c562005-06-30 22:04:15 +0000590
Bram Moolenaar4770d092006-01-12 23:22:24 +0000591 /encoding=utf-8 ~
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000592- A line starting with "/encoding=", before any word, specifies the encoding
593 of the file. After the second '=' comes an encoding name. This tells Vim
Bram Moolenaard042c562005-06-30 22:04:15 +0000594 to setup conversion from the specified encoding to 'encoding'. Thus you can
595 use one word list for several target encodings.
596
Bram Moolenaar4770d092006-01-12 23:22:24 +0000597 /regions=usca ~
Bram Moolenaar3638c682005-06-08 22:05:14 +0000598- A line starting with "/regions=" specifies the region names that are
599 supported. Each region name must be two ASCII letters. The first one is
600 region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca".
Bram Moolenaard042c562005-06-30 22:04:15 +0000601 In an addition word list the region names should be equal to the main word
602 list!
603
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000604- Other lines starting with '/' are reserved for future use. The ones that
Bram Moolenaar4770d092006-01-12 23:22:24 +0000605 are not recognized are ignored. You do get a warning message, so that you
606 know something won't work.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000607
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000608- A "/" may follow the word with the following items:
609 = Case must match exactly.
610 ? Rare word.
611 ! Bad (wrong) word.
612 digit A region in which the word is valid. If no regions are
613 specified the word is valid in all regions.
614
Bram Moolenaar3638c682005-06-08 22:05:14 +0000615Example:
616
617 # This is an example word list comment
618 /encoding=latin1 encoding of the file
619 /regions=uscagb regions "us", "ca" and "gb"
620 example word for all regions
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000621 blah/12 word for regions "us" and "ca"
622 vim/! bad word
623 Campbell/?3 rare word in region 3 "gb"
624 's mornings/= keep-case word
Bram Moolenaar3638c682005-06-08 22:05:14 +0000625
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000626Note that when "/=" is used the same word with all upper-case letters is not
627accepted. This is different from a word with mixed case that is automatically
628marked as keep-case, those words may appear in all upper-case letters.
629
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000630
Bram Moolenaar4770d092006-01-12 23:22:24 +0000631FORMAT WITH .AFF and .DIC FILES
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000632
Bram Moolenaar4770d092006-01-12 23:22:24 +0000633There are two files: the basic word list and an affix file. The affix file
634specifies settings for the language and can contain affixes. The affixes are
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000635used to modify the basic words to get the full word list. This significantly
636reduces the number of words, especially for a language like Polish. This is
637called affix compression.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000638
Bram Moolenaar4770d092006-01-12 23:22:24 +0000639The basic word list and the affix file are combined with the ":mkspell"
640command and results in a binary spell file. All the preprocessing has been
641done, thus this file loads fast. The binary spell file format is described in
642the source code (src/spell.c). But only developers need to know about it.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000643
644The preprocessing also allows us to take the Myspell language files and modify
645them before the Vim word list is made. The tools for this can be found in the
646"src/spell" directory.
647
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000648The format for the affix and word list files is based on what Myspell uses
649(the spell checker of Mozilla and OpenOffice.org). A description can be found
650here:
651 http://lingucomponent.openoffice.org/affix.readme ~
652Note that affixes are case sensitive, this isn't obvious from the description.
653
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000654Vim supports quite a few extras. They are described below |spell-affix-vim|.
655Attempts have been made to keep this compatible with other spell checkers, so
Bram Moolenaar4770d092006-01-12 23:22:24 +0000656that the same files can often be used. One other project that offers more
657than Myspell is Hunspell ( http://hunspell.sf.net ).
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000658
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000659
Bram Moolenaar3638c682005-06-08 22:05:14 +0000660WORD LIST FORMAT *spell-dic-format*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000661
Bram Moolenaar4770d092006-01-12 23:22:24 +0000662A short example, with line numbers:
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000663
Bram Moolenaar4770d092006-01-12 23:22:24 +0000664 1 1234 ~
665 2 aan ~
666 3 Als ~
667 4 Etten-Leur ~
668 5 et al. ~
669 6 's-Gravenhage ~
670 7 's-Gravenhaags ~
671 8 # word that differs between regions ~
672 9 kado/1 ~
673 10 cadeau/2 ~
674 11 TCP,IP ~
675 12 /the S affix may add a 's' ~
676 13 bedel/S ~
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000677
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000678The first line contains the number of words. Vim ignores it, but you do get
679an error message if it's not there. *E760*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000680
Bram Moolenaar4770d092006-01-12 23:22:24 +0000681What follows is one word per line. White space at the end of the line is
682ignored, all other white space matters. The encoding is specified in the
683affix file |spell-SET|.
684
685Comment lines start with '#' or '/'. See the example lines 8 and 12. Note
686that putting a comment after a word is NOT allowed:
687
688 someword # comment that causes an error! ~
689
690After the word there is an optional slash and flags. Most of these flags are
691letters that indicate the affixes that can be used with this word. These are
692specified with SFX and PFX lines in the .aff file, see |spell-SFX| and
693|spell-PFX|. Vim allows using other flag types with the FLAG item in the
694affix file |spell-FLAG|.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000695
696When the word only has lower-case letters it will also match with the word
697starting with an upper-case letter.
698
699When the word includes an upper-case letter, this means the upper-case letter
700is required at this position. The same word with a lower-case letter at this
701position will not match. When some of the other letters are upper-case it will
702not match either.
703
Bram Moolenaar4770d092006-01-12 23:22:24 +0000704The word with all upper-case characters will always be OK,
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000705
706 word list matches does not match ~
707 als als Als ALS ALs AlS aLs aLS
708 Als Als ALS als ALs AlS aLs aLS
709 ALS ALS als Als ALs AlS aLs aLS
710 AlS AlS ALS als Als ALs aLs aLS
711
Bram Moolenaar1cbe5f72005-12-29 22:51:09 +0000712The KEEPCASE affix ID can be used to specifically match a word with identical
713case only, see below |spell-KEEPCASE|.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000714
Bram Moolenaar4770d092006-01-12 23:22:24 +0000715Note: in line 5 to 7 non-word characters are used. You can include any
716character in a word. When checking the text a word still only matches when it
717appears with a non-word character before and after it. For Myspell a word
718starting with a non-word character probably won't work.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000719
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000720In line 12 the word "TCP/IP" is defined. Since the slash has a special
721meaning the comma is used instead. This is defined with the SLASH item in the
Bram Moolenaar4770d092006-01-12 23:22:24 +0000722affix file, see |spell-SLASH|. Note that without this SLASH item the word
723will be "TCP,IP".
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000724
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000725
Bram Moolenaar4770d092006-01-12 23:22:24 +0000726AFFIX FILE FORMAT *spell-aff-format* *spell-affix-vim*
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000727
Bram Moolenaar4770d092006-01-12 23:22:24 +0000728 *spell-affix-comment*
729Comment lines in the .aff file start with a '#':
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000730
Bram Moolenaar4770d092006-01-12 23:22:24 +0000731 # comment line ~
732
733With some items it's also possible to put a comment after it, but this isn't
734supported in general.
735
736
737ENCODING *spell-SET*
738
739The affix file can be in any encoding that is supported by "iconv". However,
740in some cases the current locale should also be set properly at the time
741|:mkspell| is invoked. Adding FOL/LOW/UPP lines removes this requirement
742|spell-FOL|.
743
744The encoding should be specified before anything where the encoding matters.
745The encoding applies both to the affix file and the dictionary file. It is
746done with a SET line:
747
748 SET utf-8 ~
749
750The encoding can be different from the value of the 'encoding' option at the
751time ":mkspell" is used. Vim will then convert everything to 'encoding' and
752generate a spell file for 'encoding'. If some of the used characters to not
753fit in 'encoding' you will get an error message.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000754 *spell-affix-mbyte*
Bram Moolenaar4770d092006-01-12 23:22:24 +0000755When using a multi-byte encoding it's possible to use more different affix
756flags. But Myspell doesn't support that, thus you may not want to use it
757anyway. For compatibility use an 8-bit encoding.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000758
Bram Moolenaare13305e2005-06-19 22:54:15 +0000759
760CHARACTER TABLES
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000761 *spell-affix-chars*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000762When using an 8-bit encoding the affix file should define what characters are
Bram Moolenaar4770d092006-01-12 23:22:24 +0000763word characters. This is because the system where ":mkspell" is used may not
764support a locale with this encoding and isalpha() won't work. For example
765when using "cp1250" on Unix.
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000766 *E761* *E762* *spell-FOL*
767 *spell-LOW* *spell-UPP*
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000768Three lines in the affix file are needed. Simplistic example:
769
Bram Moolenaare13305e2005-06-19 22:54:15 +0000770 FOL áëñ ~
771 LOW áëñ ~
772 UPP ÁËÑ ~
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000773
774All three lines must have exactly the same number of characters.
775
776The "FOL" line specifies the case-folded characters. These are used to
777compare words while ignoring case. For most encodings this is identical to
778the lower case line.
779
780The "LOW" line specifies the characters in lower-case. Mostly it's equal to
781the "FOL" line.
782
783The "UPP" line specifies the characters with upper-case. That is, a character
784is upper-case where it's different from the character at the same position in
785"FOL".
786
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000787An exception is made for the German sharp s ß. The upper-case version is
788"SS". In the FOL/LOW/UPP lines it should be included, so that it's recognized
789as a word character, but use the ß character in all three.
790
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000791ASCII characters should be omitted, Vim always handles these in the same way.
792When the encoding is UTF-8 no word characters need to be specified.
793
794 *E763*
Bram Moolenaar3b506942005-06-23 22:36:45 +0000795Vim allows you to use spell checking for several languages in the same file.
796You can list them in the 'spelllang' option. As a consequence all spell files
797for the same encoding must use the same word characters, otherwise they can't
798be combined without errors. If you get a warning that the word tables differ
799you may need to generate the .spl file again with |:mkspell|. Check the FOL,
800LOW and UPP lines in the used .aff file.
801
802The XX.ascii.spl spell file generated with the "-ascii" argument will not
803contain the table with characters, so that it can be combine with spell files
804for any encoding. The .add.spl files also do not contain the table.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000805
Bram Moolenaare7566042005-06-17 22:00:15 +0000806
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000807MID-WORD CHARACTERS
808 *spell-midword*
809Some characters are only to be considered word characters if they are used in
810between two ordinary word characters. An example is the single quote: It is
811often used to put text in quotes, thus it can't be recognized as a word
812character, but when it appears in between word characters it must be part of
813the word. This is needed to detect a spelling error such as they'are. That
814should be they're, but since "they" and "are" are words themselves that would
815go unnoticed.
816
Bram Moolenaar4770d092006-01-12 23:22:24 +0000817These characters are defined with MIDWORD in the .aff file. Example:
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000818
819 MIDWORD '- ~
820
821
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000822FLAG TYPES *spell-FLAG*
823
824Flags are used to specify the affixes that can be used with a word and for
825other properties of the word. Normally single-character flags are used. This
826limits the number of possible flags, especially for 8-bit encodings. The FLAG
827item can be used if more affixes are to be used. Possible values:
828
829 FLAG long use two-character flags
830 FLAG num use numbers, from 1 up to 65000
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000831 FLAG caplong use one-character flags without A-Z and two-character
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000832 flags that start with A-Z
833
834With "FLAG num" the numbers in a list of affixes need to be separated with a
835comma: "234,2143,1435". This method is inefficient, but useful if the file is
836generated with a program.
837
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000838When using "caplong" the two-character flags all start with a capital: "Aa",
839"B1", "BB", etc. This is useful to use one-character flags for the most
840common items and two-character flags for uncommon items.
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000841
842Note: When using utf-8 only characters up to 65000 may be used for flags.
843
844
Bram Moolenaare13305e2005-06-19 22:54:15 +0000845AFFIXES
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000846 *spell-PFX* *spell-SFX*
Bram Moolenaare13305e2005-06-19 22:54:15 +0000847The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000848documentation or the Aspell manual:
849http://aspell.net/man-html/Affix-Compression.html).
Bram Moolenaare13305e2005-06-19 22:54:15 +0000850
Bram Moolenaar4770d092006-01-12 23:22:24 +0000851Summary:
852 SFX L Y 2 ~
853 SFX L 0 re [^x] ~
854 SFX L 0 ro x ~
855
856The first line is a header and has four fields:
857 SFX {flag} {combine} {count}
858
859{flag} The name used for the suffix. Mostly it's a single letter,
860 but other characters can be used, see |spell-FLAG|.
861
862{combine} Can be 'Y' or 'N'. When 'Y' then the word plus suffix can
863 also have a prefix. When 'N' then a prefix is not allowed.
864
865{count} The number of lines following. If this is wrong you will get
866 an error message.
867
868For PFX the fields are exactly the same.
869
870The basic format for the following lines is:
871 SFX {flag} {strip} {add} {condition}
872
873{flag} Must be the same as the {flag} used in the first line.
874
875{strip} Characters removed from the basic word. There is no check if
876 the characters are actually there, only the length is used (in
877 bytes). This better match the {condition}, otherwise strange
878 things may happen. If the {strip} length is equal to or
879 longer than the basic word the suffix won't be used.
880 When {strip} is 0 (zero) then nothing is stripped.
881
882{add} Characters added to the basic word, after removing {strip}.
883
884{condition} A simplistic pattern. Only when this matches with a basic
885 word will the suffix be used for that word. This is normally
886 for using one suffix letter with different {add} and {strip}
887 fields for words with different endings.
888 When {condition} is a . (dot) there is no condition.
889 The pattern may contain:
890 - Literal characters.
891 - A set of characters in []. [abc] matches a, b and c.
892 A dash is allowed for a range [a-c], but this is
893 Vim-specific.
894 - A set of characters that starts with a ^, meaning the
895 complement of the specified characters. [^abc] matches any
896 character but a, b and c.
897
898For PFX the fields are the same, but the {strip}, {add} and {condition} apply
899to the start of the word.
900
901Note: Myspell ignores any extra text after the relevant info. Vim requires
902this text to start with a "#" so that mistakes don't go unnoticed. Example:
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000903
904 SFX F 0 in [^i]n # Spion > Spionin ~
905 SFX F 0 nen in # Bauerin > Bauerinnen ~
906
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000907Apparently Myspell allows an affix name to appear more than once. Since this
908might also be a mistake, Vim checks for an extra "S". The affix files for
909Myspell that use this feature apparently have this flag. Example:
910
911 SFX a Y 1 S ~
912 SFX a 0 an . ~
913
914 SFX a Y 2 S ~
915 SFX a 0 en . ~
916 SFX a 0 on . ~
917
Bram Moolenaar4770d092006-01-12 23:22:24 +0000918
919AFFIX FLAGS *spell-affix-flags*
920
921This is a feature that comes from Hunspell: The affix may specify flags. This
922works similar to flags specified on a basic word. The flags apply to the
923basic word plus the affix. Example:
924
925 SFX S Y 1 ~
926 SFX S 0 s . ~
927
928 SFX A Y 1 ~
929 SFX A 0 able/S . ~
930
931When the dictionary file contains "drink/AS" then these words are possible:
932
933 drink
934 drinks uses S suffix
935 drinkable uses A suffix
936 drinkables uses A suffix and then S suffix
937
938Generally the flags of the suffix are added to the flags of the basic word,
939both are used for the word plus suffix. But the flags of the basic word are
940only used once for affixes, except that both one prefix and one suffix can be
941used when both support combining.
942
943Specifically, the affix flags can be used for:
944- Affixes on affixes, as in the example above.
945- Making the word with the affix rare, by using the |spell-RARE| flag.
946- Exclude the word with the affix from compounding, by using the
947 |spell-COMPOUNDFORBIDFLAG| flag.
948
949-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
950OLD STUFF
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000951 *spell-affix-rare*
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000952An extra item for Vim is the "rare" flag. It must come after the other
953fields, before a comment. When used then all words that use the affix will be
Bram Moolenaar4770d092006-01-12 23:22:24 +0000954marked as rare words. Examples:
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000955
956 PFX F 0 nene . rare ~
957 SFX F 0 oin n rare # hardly ever used ~
958
Bram Moolenaar4770d092006-01-12 23:22:24 +0000959However, if the word also appears as a good word in another way (e.g., in
960another region) it won't be marked as rare.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000961
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000962 *spell-affix-nocomp*
963Another extra item for Vim is the "nocomp" flag. It must come after the other
Bram Moolenaar90915b52005-08-21 22:17:52 +0000964fields, before a comment. It can be either before or after "rare". When
965present then all words that use the affix will not be part of a compound word.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000966Example:
967 affix file:
968 COMPOUNDFLAG c ~
969 SFX a Y 2 ~
970 SFX a 0 s . ~
971 SFX a 0 ize . nocomp ~
972 dictionary:
973 word/c ~
974 util/ac ~
975
976This allows for "wordutil" and "wordutils" but not "wordutilize".
Bram Moolenaar4770d092006-01-12 23:22:24 +0000977-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000978
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000979 *spell-PFXPOSTPONE*
Bram Moolenaare13305e2005-06-19 22:54:15 +0000980When an affix file has very many prefixes that apply to many words it's not
981possible to build the whole word list in memory. This applies to Hebrew (a
982list with all words is over a Gbyte). In that case applying prefixes must be
983postponed. This makes spell checking slower. It is indicated by this keyword
984in the .aff file:
985
986 PFXPOSTPONE ~
987
988Only prefixes without a chop string can be postponed, prefixes with a chop
Bram Moolenaar78984f52005-08-01 07:19:10 +0000989string will still be included in the word list. An exception if the chop
990string is one character and equal to the last character of the added string,
991but in lower case. Thus when the chop string is used to allow the following
992word to start with an upper case letter.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000993
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000994
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000995WORDS WITH A SLASH *spell-SLASH*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000996
997The slash is used in the .dic file to separate the basic word from the affix
998letters that can be used. Unfortunately, this means you cannot use a slash in
999a word. Thus "TCP/IP" cannot be a word. To work around that you can define a
1000replacement character for the slash. Example:
1001
1002 SLASH , ~
1003
1004Now you can use "TCP,IP" to add the word "TCP/IP".
1005
1006Of course, the letter used should itself not appear in any word! The letter
1007must be ASCII, thus a single byte.
1008
1009
Bram Moolenaar1cbe5f72005-12-29 22:51:09 +00001010KEEP-CASE WORDS *spell-KEEPCASE*
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001011
Bram Moolenaar1cbe5f72005-12-29 22:51:09 +00001012In the affix file a KEEPCASE line can be used to define the affix name used
1013for keep-case words. Example:
Bram Moolenaar45eeb132005-06-06 21:59:07 +00001014
Bram Moolenaar1cbe5f72005-12-29 22:51:09 +00001015 KEEPCASE = ~
Bram Moolenaar45eeb132005-06-06 21:59:07 +00001016
Bram Moolenaar4770d092006-01-12 23:22:24 +00001017This flag is not supported by Myspell. It has the meaning that case matters.
1018This can be used if the word does not have the first letter in upper case at
1019the start of a sentence. Example:
1020
1021 word list matches does not match ~
1022 's morgens/= 's morgens 'S morgens 's Morgens 'S MORGENS
1023 's Morgens 's Morgens 'S MORGENS 'S morgens 's morgens
1024
1025The flag can also be used to avoid that the word matches when it is in all
1026upper-case letters.
Bram Moolenaar45eeb132005-06-06 21:59:07 +00001027
Bram Moolenaare13305e2005-06-19 22:54:15 +00001028
Bram Moolenaar1cbe5f72005-12-29 22:51:09 +00001029RARE WORDS *spell-RARE*
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001030
Bram Moolenaar1cbe5f72005-12-29 22:51:09 +00001031In the affix file a RARE line can be used to define the affix name used for
Bram Moolenaar45eeb132005-06-06 21:59:07 +00001032rare words. Example:
1033
Bram Moolenaar1cbe5f72005-12-29 22:51:09 +00001034 RARE ? ~
Bram Moolenaar45eeb132005-06-06 21:59:07 +00001035
1036Rare words are highlighted differently from bad words. This is to be used for
1037words that are correct for the language, but are hardly ever used and could be
Bram Moolenaar30abd282005-06-22 22:35:10 +00001038a typing mistake anyway. When the same word is found as good it won't be
1039highlighted as rare.
1040
1041
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001042BAD WORDS *spell-BAD*
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001043
Bram Moolenaar30abd282005-06-22 22:35:10 +00001044In the affix file a BAD line can be used to define the affix name used for
1045bad words. Example:
1046
1047 BAD ! ~
1048
1049This can be used to exclude words that would otherwise be good. For example
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +00001050"the the" in the .dic file:
1051
1052 the the/! ~
1053
1054Once a word has been marked as bad it won't be undone by encountering the same
1055word as good.
Bram Moolenaar45eeb132005-06-06 21:59:07 +00001056
Bram Moolenaar4770d092006-01-12 23:22:24 +00001057The flag also applies to the word with affixes, thus this can be used to mark
1058a whole bunch of related words as bad.
1059
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001060 *spell-NEEDAFFIX*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001061The NEEDAFFIX flag is used to require that a word is used with an affix. The
Bram Moolenaar4770d092006-01-12 23:22:24 +00001062word itself is not a good word (unless there is an empty affix). Example:
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001063
1064 NEEDAFFIX + ~
1065
Bram Moolenaar45eeb132005-06-06 21:59:07 +00001066
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001067COMPOUND WORDS *spell-compound*
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001068
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001069A compound word is a longer word made by concatenating words that appear in
1070the .dic file. To specify which words may be concatenated a character is
1071used. This character is put in the list of affixes after the word. We will
1072call this character a flag here. Obviously these flags must be different from
1073any affix IDs used.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001074
1075 *spell-COMPOUNDFLAG*
Bram Moolenaar4770d092006-01-12 23:22:24 +00001076The Myspell compatible method uses one flag, specified with COMPOUNDFLAG. All
1077words with this flag combine in any order. This means there is no control
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001078over which word comes first. Example:
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001079 COMPOUNDFLAG c ~
1080
1081 *spell-COMPOUNDFLAGS*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001082A more advanced method to specify how compound words can be formed uses
1083multiple items with multiple flags. This is not compatible with Myspell 3.0.
1084Let's start with an example:
1085 COMPOUNDFLAGS c+ ~
1086 COMPOUNDFLAGS se ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001087
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001088The first line defines that words with the "c" flag can be concatenated in any
1089order. The second line defines compound words that are made of one word with
1090the "s" flag and one word with the "e" flag. With this dictionary:
1091 bork/c ~
1092 onion/s ~
1093 soup/e ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001094
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001095You can make these words:
1096 bork
1097 borkbork
1098 borkborkbork
1099 (etc.)
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001100 onion
1101 soup
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001102 onionsoup
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001103
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001104The COMPOUNDFLAGS item may appear multiple times. The argument is made out of
1105one or more groups, where each group can be:
1106 one flag e.g., c
1107 alternate flags inside [] e.g., [abc]
1108Optionally this may be followed by:
1109 * the group appears zero or more times, e.g., sm*e
1110 + the group appears one or more times, e.g., c+
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001111
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001112This is similar to the regexp pattern syntax (but not the same!). A few
1113examples with the sequence of word flags they require:
1114 COMPOUNDFLAGS x+ x xx xxx etc.
1115 COMPOUNDFLAGS yz yz
1116 COMPOUNDFLAGS x+z xz xxz xxxz etc.
1117 COMPOUNDFLAGS yx+ yx yxx yxxx etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001118
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001119 COMPOUNDFLAGS [abc]z az bz cz
1120 COMPOUNDFLAGS [abc]+z az aaz abaz bz baz bcbz cz caz cbaz etc.
1121 COMPOUNDFLAGS a[xyz]+ ax axx axyz ay ayx ayzz az azy azxy etc.
1122 COMPOUNDFLAGS sm*e se sme smme smmme etc.
1123 COMPOUNDFLAGS s[xyz]*e se sxe sxye sxyxe sye syze sze szye szyxe etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001124
Bram Moolenaara6c840d2005-08-22 22:59:46 +00001125A specific example: Allow a compound to be made of two words and a dash:
1126 In the .aff file:
1127 COMPOUNDFLAGS sde ~
1128 NEEDAFFIX x ~
1129 COMPOUNDMAX 3 ~
1130 COMPOUNDMIN 1 ~
1131 In the .dic file:
1132 start/s ~
1133 end/e ~
1134 -/xd ~
1135
1136This allows for the word "start-end", but not "startend".
1137
Bram Moolenaar4770d092006-01-12 23:22:24 +00001138 *spell-NEEDCOMPOUND*
1139The NEEDCOMPOUND flag is used to require that a word is used as part of a
1140compound word. The word itself is not a good word. Example:
1141
1142 NEEDCOMPOUND & ~
1143
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001144 *spell-COMPOUNDMIN*
Bram Moolenaarac6e65f2005-08-29 22:25:38 +00001145The minimal character length of a word used for compounding is specified with
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001146COMPOUNDMIN. Example:
1147 COMPOUNDMIN 5 ~
1148
Bram Moolenaarac6e65f2005-08-29 22:25:38 +00001149When omitted there is no minimal length. Obviously you could just leave out
1150the compound flag from short words instead, this feature is present for
1151compatibility with Myspell.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001152
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001153 *spell-COMPOUNDMAX*
1154The maximum number of words that can be concatenated into a compound word is
1155specified with COMPOUNDMAX. Example:
1156 COMPOUNDMAX 3 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001157
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001158When omitted there is no maximum. It applies to all compound words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001159
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001160To set a limit for words with specific flags make sure the items in
1161COMPOUNDFLAGS where they appear don't allow too many words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001162
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001163 *spell-COMPOUNDSYLMAX*
1164The maximum number of syllables that a compound word may contain is specified
1165with COMPOUNDSYLMAX. Example:
1166 COMPOUNDSYLMAX 6 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001167
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001168This has no effect if there is no SYLLABLE item. Without COMPOUNDSYLMAX there
1169is no limit on the number of syllables.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001170
Bram Moolenaara6c840d2005-08-22 22:59:46 +00001171If both COMPOUNDMAX and COMPOUNDSYLMAX are defined, a compound word is
1172accepted if it fits one of the criteria, thus is either made from up to
1173COMPOUNDMAX words or contains up to COMPOUNDSYLMAX syllables.
1174
Bram Moolenaar4770d092006-01-12 23:22:24 +00001175 *spell-COMPOUNDFORBIDFLAG*
1176The COMPOUNDFORBIDFLAG specifies a flag that can be used on an affix. It
1177means that the word plus affix cannot be used in a compound word.
1178NOT IMPLEMENTED YET.
1179
1180 *spell-COMPOUNDPERMITFLAG*
1181The COMPOUNDPERMITFLAG specifies a flag that can be used on an affix. It
1182means that the word plus affix can also be used in a compound word in a way
1183where the affix ends up halfway the word.
1184NOT IMPLEMENTED YET.
1185
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001186 *spell-SYLLABLE*
1187The SYLLABLE item defines characters or character sequences that are used to
1188count the number of syllables in a word. Example:
1189 SYLLABLE aáeéiíoóöõuúüûy/aa/au/ea/ee/ei/ie/oa/oe/oo/ou/uu/ui ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001190
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001191Before the first slash is the set of characters that are counted for one
1192syllable, also when repeated and mixed, until the next character that is not
1193in this set. After the slash come sequences of characters that are counted
1194for one syllable. These are preferred over using characters from the set.
1195With the example "ideeen" has three syllables, counted by "i", "ee" and "e".
1196
1197Only case-folded letters need to be included.
1198
1199Above another way to restrict compounding was mentioned above: adding "nocomp"
1200after an affix causes all words that are made with that affix not be be used
1201for compounding. |spell-affix-nocomp|
1202
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001203
1204UNLIMITED COMPOUNDING *spell-NOBREAK*
1205
1206For some languages, such as Thai, there is no space in between words. This
1207looks like all words are compounded. To specify this use the NOBREAK item in
1208the affix file, without arguments:
1209 NOBREAK ~
1210
1211Vim will try to figure out where one word ends and a next starts. When there
1212are spelling mistakes this may not be quite right.
1213
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001214>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
1215NOTE: The following has not been implemented yet, because there are no word
1216lists that support this.
1217> *spell-CMP*
1218> Sometimes it is necessary to change a word when concatenating it to another,
1219> by removing a few letters, inserting something or both. It can also be useful
1220> to restrict concatenation to words that match a pattern. For this purpose CMP
1221> items can be used. They look like this:
1222> CMP {flag} {flags} {strip} {strip2} {add} {cond} {cond2}
1223>
1224> {flag} the flag, as used in COMPOUNDFLAGS for the lead word
1225> {flags} accepted flags for the following word ('.' to accept
1226> all)
1227> {strip} text to remove from the end of the lead word (zero
1228> for no stripping)
1229> {strip2} text to remove from the start of the following word
1230> (zero for no stripping)
1231> {add} text to insert between the words (zero for no
1232> addition)
1233> {cond} condition to match at the end of the lead word
1234> {cond2} condition to match at the start of the following word
1235>
1236> This is the same as what is used for SFX and PFX items, with the extra {flags}
1237> and {cond2} fields. Example:
1238> CMP f mrt 0 - . . ~
1239>
1240> When used with the food and dish word list above, this means that a dash is
1241> inserted after each food item. Thus you get "onion-soup" and
1242> "onion-tomato-salat".
1243>
1244> When there are CMP items for a compound flag the concatenation is only done
1245> when a CMP item matches.
1246>
1247> When there are no CMP items for a compound flag, then all words will be
1248> concatenated, as if there was an item:
1249> CMP {flag} . 0 0 . .
1250>
1251>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001252
1253
Bram Moolenaar4770d092006-01-12 23:22:24 +00001254 *spell-COMMON*
1255Common words can be specified with the COMMON item. This will give better
1256suggestions when editing a short file. Example:
1257
1258 COMMON the of to and a in is it you that he was for on are ~
1259
1260The words must be separated by white space, up to 25 per line.
1261When multiple regions are specified in a ":mkspell" command the common words
1262for all regions are combined and used for all regions.
1263
1264 *spell-NOSPLITSUGS*
1265This item indicates that suggestions for splitting a word will not appear:
1266
1267 NOSPLITSUGS ~
1268
1269 *spell-NOSUGGEST*
1270The flag specified with NOSUGGEST can be used for words that will not be
1271suggested. Can be used for obscene words.
1272
1273 NOSUGGEST % ~
1274
1275NOT IMPLEMENTED YET.
1276
1277
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001278REPLACEMENTS *spell-REP*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001279
1280In the affix file REP items can be used to define common mistakes. This is
1281used to make spelling suggestions. The items define the "from" text and the
1282"to" replacement. Example:
1283
1284 REP 4 ~
1285 REP f ph ~
1286 REP ph f ~
1287 REP k ch ~
1288 REP ch k ~
1289
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001290The first line specifies the number of REP lines following. Vim ignores the
Bram Moolenaar4770d092006-01-12 23:22:24 +00001291number, but it must be there (for compatibility with Myspell).
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001292
Bram Moolenaard042c562005-06-30 22:04:15 +00001293Don't include simple one-character replacements or swaps. Vim will try these
1294anyway. You can include whole words if you want to, but you might want to use
1295the "file:" item in 'spellsuggest' instead.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001296
Bram Moolenaar1e015462005-09-25 22:16:38 +00001297You can include a space by using an underscore:
1298
1299 REP the_the the ~
1300
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001301
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001302SIMILAR CHARACTERS *spell-MAP*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001303
Bram Moolenaard042c562005-06-30 22:04:15 +00001304In the affix file MAP items can be used to define letters that are very much
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001305alike. This is mostly used for a letter with different accents. This is used
1306to prefer suggestions with these letters substituted. Example:
1307
1308 MAP 2 ~
1309 MAP eéëêè ~
1310 MAP uüùúû ~
1311
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001312The first line specifies the number of MAP lines following. Vim ignores the
1313number, but the line must be there.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001314
Bram Moolenaard042c562005-06-30 22:04:15 +00001315Each letter must appear in only one of the MAP items. It's a bit more
1316efficient if the first letter is ASCII or at least one without accents.
Bram Moolenaare7566042005-06-17 22:00:15 +00001317
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001318
Bram Moolenaar4770d092006-01-12 23:22:24 +00001319.SUG FILE *spell-NOSUGFILE*
1320
1321When soundfolding is specified in the affix file then ":mkspell" will normally
1322p ~ ~roduce a .sug file next to the .spl file. This used to find suggestions by
1323their sound-a-like form quickly. At the cost of a lot of memory.
1324
1325To avoid producing a .sug file use this item in the affix file:
1326
1327 NOSUGFILE ~
1328
1329
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001330SOUND-A-LIKE *spell-SAL*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001331
1332In the affix file SAL items can be used to define the sounds-a-like mechanism
1333to be used. The main items define the "from" text and the "to" replacement.
Bram Moolenaard042c562005-06-30 22:04:15 +00001334Simplistic example:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001335
1336 SAL CIA X ~
1337 SAL CH X ~
1338 SAL C K ~
1339 SAL K K ~
1340
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001341There are a few rules and this can become quite complicated. An explanation
Bram Moolenaard042c562005-06-30 22:04:15 +00001342how it works can be found in the Aspell manual:
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001343http://aspell.net/man-html/Phonetic-Code.html.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001344
1345There are a few special items:
1346
1347 SAL followup true ~
1348 SAL collapse_result true ~
1349 SAL remove_accents true ~
1350
1351"1" has the same meaning as "true". Any other value means "false".
1352
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001353
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001354SIMPLE SOUNDFOLDING *spell-SOFOFROM* *spell-SOFOTO*
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001355
1356The SAL mechanism is complex and slow. A simpler mechanism is mapping all
1357characters to another character, mapping similar sounding characters to the
1358same character. At the same time this does case folding. You can not have
Bram Moolenaard042c562005-06-30 22:04:15 +00001359both SAL items and simple soundfolding.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001360
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001361There are two items required: one to specify the characters that are mapped
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001362and one that specifies the characters they are mapped to. They must have
1363exactly the same number of characters. Example:
1364
1365 SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~
1366 SOFOTO ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~
1367
1368In the example all vowels are mapped to the same character 'e'. Another
Bram Moolenaard042c562005-06-30 22:04:15 +00001369method would be to leave out all vowels. Some characters that sound nearly
1370the same and are often mixed up, such as 'm' and 'n', are mapped to the same
1371character. Don't do this too much, all words will start looking alike.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001372
1373Characters that do not appear in SOFOFROM will be left out, except that all
1374white space is replaced by one space. Sequences of the same character in
1375SOFOFROM are replaced by one.
1376
1377You can use the |soundfold()| function to try out the results. Or set the
Bram Moolenaarcc016f52005-12-10 20:23:46 +00001378'verbose' option to see the score in the output of the |z=| command.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001379
1380
Bram Moolenaar4770d092006-01-12 23:22:24 +00001381UNSUPPORTED ITEMS *spell-affix-not-supported*
1382
1383These items appear in the affix file of other spell checkers. In Vim they are
1384ignored, not supported or defined in another way.
1385
1386ACCENT (Hunspell) *spell-ACCENT*
1387 Use MAP instead. |spell-MAP|
1388
1389CHECKCOMPOUNDCASE (Hunspell) *spell-CHECKCOMPOUNDCASE*
1390 Disallow uppercase letters at compound word boundaries.
1391 Not supported.
1392
1393CHECKCOMPOUNDDUP (Hunspell) *spell-CHECKCOMPOUNDDUP*
1394 Disallow using the same word twice in a compound. Not
1395 supported.
1396
1397CHECKCOMPOUNDREP (Hunspell) *spell-CHECKCOMPOUNDREP*
1398 Something about using REP items and compound words. Not
1399 supported.
1400
1401CHECKCOMPOUNDTRIPLE (Hunspell) *spell-CHECKCOMPOUNDTRIPLE*
1402 Forbid three identical characters when compounding. Not
1403 supported.
1404
1405CHECKCOMPOUNDPATTERN (Hunspell) *spell-CHECKCOMPOUNDPATTERN*
1406 Forbid compounding when patterns match. Not supported.
1407
1408CIRCUMFIX (Hunspell) *spell-CIRCUMFIX*
1409 This means a prefix and suffix must be added at the same time.
1410 Instead only specify the suffix, and give the that suffix two
1411 flags: The required prefix and the NEEDAFFIX flag.
1412 |spell-NEEDAFFIX|
1413
1414COMPLEXPREFIXES (Hunspell) *spell-COMPLEXPREFIXES*
1415 Enables using two prefixes. Not supported.
1416
1417COMPOUNDBEGIN (Hunspell) *spell-COMPOUNDBEGIN*
1418 Use COMPOUNDFLAGS instead. |spell-COMPOUNDFLAGS|
1419
1420COMPOUNDEND (Hunspell) *spell-COMPOUNDEND*
1421 Use COMPOUNDFLAGS instead. |spell-COMPOUNDFLAGS|
1422
1423COMPOUNDMIDDLE (Hunspell) *spell-COMPOUNDMIDDLE*
1424 Use COMPOUNDFLAGS instead. |spell-COMPOUNDFLAGS|
1425
1426COMPOUNDROOT (Hunspell) *spell-COMPOUNDROOT*
1427 Flag for words in the dictionary that are already a compound.
1428 Vim doesn't use it.
1429
1430COMPOUNDSYLLABLE (Hunspell) *spell-COMPOUNDSYLLABLE*
1431 Use SYLLABLE and COMPOUNDSYLMAX instead. |spell-SYLLABLE|
1432 |spell-COMPOUNDSYLMAX|
1433
1434COMPOUNDWORDMAX (Hunspell) *spell-COMPOUNDWORDMAX*
1435 Use COMPOUNDMAX instead. |spell-COMPOUNDMAX|
1436
1437FORBIDDENWORD (Hunspell) *spell-FORBIDDENWORD*
1438 Use BAD instead. |spell-BAD|
1439
1440HOME (Hunspell) *spell-HOME*
1441 Specifies the website for the language. Not supported.
1442
1443LANG (Hunspell) *spell-LANG*
1444 This specifies language-specific behavior. This actually
1445 moves part of the language knowledge into the program,
1446 therefore Vim does not support it. Each language property
1447 must be specified separately.
1448
1449LEMMA_PRESENT (Hunspell) *spell-LEMMA_PRESENT*
1450 Only needed for mprphological analysis.
1451
1452MAXNGRAMSUGS (Hunspell) *spell-MAXNGRAMSUGS*
1453 Not supported.
1454
1455NAME (Hunspell) *spell-NAME*
1456 Specifies the name of the language. Not supported.
1457
1458ONLYINCOMPOUND (Hunspell) *spell-ONLYINCOMPOUND*
1459 Use NEEDCOMPOUND instead. |spell-NEEDCOMPOUND|
1460
1461PSEUDOROOT (Hunspell) *spell-PSEUDOROOT*
1462 Use NEEDAFFIX instead. |spell-NEEDAFFIX|
1463
1464SUGSWITHDOTS (Hunspell) *spell-SUGSWITHDOTS*
1465 Adds dots to suggestions. Vim doesn't need this.
1466
1467SYLLABLENUM (Hunspell) *spell-SYLLABLENUM*
1468 Not supported.
1469
1470TRY (Myspell, Hunspell, others) *spell-TRY*
1471 Vim does not use the TRY item, it is ignored. For making
1472 suggestions the actual characters in the words are used.
1473
1474VERSION (Hunspell) *spell-VERSION*
1475 Specifies the version for the language. Not supported.
1476
1477WORDCHARS (Hunspell) *spell-WORDCHARS*
1478 Used to recognize words. Vim doesn't need it, because there
1479 is no need to separate words before checking them (using a
1480 trie instead of a hashtable).
1481
Bram Moolenaar217ad922005-03-20 22:37:15 +00001482 vim:tw=78:sw=4:ts=8:ft=help:norl: