blob: 05252916a9f26f85e47f79e27a4b8dc57e7ea75a [file] [log] [blame]
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001*spell.txt* For Vim version 7.0aa. Last change: 2005 Aug 15
Bram Moolenaar217ad922005-03-20 22:37:15 +00002
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7Spell checking *spell*
8
91. Quick start |spell-quickstart|
Bram Moolenaard042c562005-06-30 22:04:15 +0000102. Remarks on spell checking |spell-remarks|
113. Generating a spell file |spell-mkspell|
124. Spell file format |spell-file-format|
Bram Moolenaar217ad922005-03-20 22:37:15 +000013
14{Vi does not have any of these commands}
15
16Spell checking is not available when the |+syntax| feature has been disabled
17at compile time.
18
19==============================================================================
201. Quick start *spell-quickstart*
21
22This command switches on spell checking: >
23
24 :setlocal spell spelllang=en_us
25
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000026This switches on the 'spell' option and specifies to check for US English.
Bram Moolenaar217ad922005-03-20 22:37:15 +000027
28The words that are not recognized are highlighted with one of these:
Bram Moolenaar520470a2005-06-16 21:59:56 +000029 SpellBad word not recognized |hl-SpellBad|
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +000030 SpellCap word not capitalised |hl-SpellCap|
Bram Moolenaar520470a2005-06-16 21:59:56 +000031 SpellRare rare word |hl-SpellRare|
32 SpellLocal wrong spelling for selected region |hl-SpellLocal|
Bram Moolenaar217ad922005-03-20 22:37:15 +000033
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000034Vim only checks words for spelling, there is no grammar check.
35
Bram Moolenaar45360022005-07-21 21:08:21 +000036If the 'mousemodel' option is set to "popup" and the cursor is on a badly
37spelled word or it is "popup_setpos" and the mouse pointer is on a badly
38spelled word, then the popup menu will contain an submenu to replace the bad
39word. Note: this slows down the appearance of the popup menu.
40
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000041To search for the next misspelled word:
42
43 *]s* *E756*
44]s Move to next misspelled word after the cursor.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000045 A count before the command can be used to repeat.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000046
47 *[s*
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000048[s Like "]s" but search backwards, find the misspelled
Bram Moolenaar30abd282005-06-22 22:35:10 +000049 word before the cursor. Doesn't recognize words
50 split over two lines, thus may stop at words that are
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000051 not highlighted as bad. Does not stop at word with
52 missing capital at the start of a line.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000053
54 *]S*
55]S Like "]s" but only stop at bad words, not at rare
56 words or words for another region.
57
58 *[S*
59[S Like "]S" but search backwards.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000060
Bram Moolenaar217ad922005-03-20 22:37:15 +000061
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +000062To add words to your own word list: *E764*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000063
64 *zg*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000065zg Add word under the cursor as a good word to the first
66 name in 'spellfile'. In Visual mode the selected
67 characters are added as a word (including white
68 space!). If the word is explicitly marked as bad word
69 in another spell file the result is unpredictable.
70 A count may precede the command to indicate the entry
71 in 'spellfile' to be used. A count of two uses the
72 second entry.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000073
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000074 *zG*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000075zG Like "zg" but add the word to the internal word list
76 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000077
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000078 *zw*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000079zw Like "zg" but mark the word as a wrong (bad) word.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000080
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000081 *zW*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000082zW Like "zw" but add the word to the internal word list
83 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000084
Bram Moolenaar520470a2005-06-16 21:59:56 +000085 *:spe* *:spellgood*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000086:[count]spe[llgood] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000087 Add {word} as a good word to 'spellfile', like with
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000088 "zg". Without count the first name is used, with a
89 count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000090
Bram Moolenaar53180ce2005-07-05 21:48:14 +000091:spe[llgood]! {word} Add {word} as a good word to the internal word list,
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000092 like with "zG".
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000093
Bram Moolenaar520470a2005-06-16 21:59:56 +000094 *:spellw* *:spellwrong*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000095:[count]spellw[rong] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000096 Add {word} as a wrong (bad) word to 'spellfile', as
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000097 with "zw". Without count the first name is used, with
98 a count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000099
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000100:spellw[rong]! {word} Add {word} as a wrong (bad) word to the internal word
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000101 list.
102
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000103After adding a word to 'spellfile' with the above commands its associated
Bram Moolenaard042c562005-06-30 22:04:15 +0000104".spl" file will automatically be updated and reloaded. If you change
105'spellfile' manually you need to use the |:mkspell| command. This sequence of
106commands mostly works well: >
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000107 :edit <file in 'spellfile'>
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000108< (make changes to the spell file) >
109 :mkspell! %
110
111More details about the 'spellfile' format below |spell-wordlist-format|.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000112
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000113 *internal-wordlist*
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000114The internal word list is used for all buffers where 'spell' is set. It is
115not stored, it is lost when you exit Vim. It is also cleared when 'encoding'
116is set.
117
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000118
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000119Finding suggestions for bad words:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000120 *z?*
Bram Moolenaar30abd282005-06-22 22:35:10 +0000121z? For the word under/after the cursor suggest correctly
Bram Moolenaard042c562005-06-30 22:04:15 +0000122 spelled words. This also works to find alternatives
123 for a word that is not highlighted as a bad word,
124 e.g., when the word after it is bad.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000125 The results are sorted on similarity to the word
126 under/after the cursor.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000127 This may take a long time. Hit CTRL-C when you are
128 bored.
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000129 This does not work when there is a line break halfway
130 a bad word (e.g., "the the").
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000131 You can enter the number of your choice or press
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000132 <Enter> if you don't want to replace. You can also
133 use the mouse to click on your choice (only works if
134 the mouse can be used in Normal mode and when there
Bram Moolenaard042c562005-06-30 22:04:15 +0000135 are no line wraps). Click on the first (header) line
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000136 to cancel.
Bram Moolenaarf3bd51a2005-06-14 22:11:18 +0000137 If 'verbose' is non-zero a score will be displayed to
138 indicate the likeliness to the badly spelled word (the
139 higher the score the more different).
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000140 When a word was replaced the redo command "." will
141 repeat the word replacement. This works like "ciw",
142 the good word and <Esc>.
143
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000144 *:spellr* *:spellrepall* *E752* *E753*
145:spellr[epall] Repeat the replacement done by |z?| for all matches
146 with the replaced word in the current window.
147
Bram Moolenaar488c6512005-08-11 20:09:58 +0000148In Insert mode, when the cursor is after a badly spelled word, you can use
149CTRL-X s to find suggestions. This works like Insert mode completion. Use
150CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s|
151
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000152The 'spellsuggest' option influences how the list of suggestions is generated
153and sorted. See |'spellsuggest'|.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000154
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000155The 'spellcapcheck' option is used to check the first word of a sentence
156starts with a capital. This doesn't work for the first word in the file.
157When there is a line break right after a sentence the highlighting of the next
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000158line may be postponed. Use |CTRL-L| when needed. Also see |set-spc-auto| for
159how it can be set automatically when 'spelllang' is set.
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000160
Bram Moolenaard042c562005-06-30 22:04:15 +0000161==============================================================================
1622. Remarks on spell checking *spell-remarks*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000163
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000164PERFORMANCE
165
Bram Moolenaard042c562005-06-30 22:04:15 +0000166Vim does on-the-fly spell checking. To make this work fast the word list is
167loaded in memory. Thus this uses a lot of memory (1 Mbyte or more). There
168might also be a noticeable delay when the word list is loaded, which happens
169when 'spell' is set and when 'spelllang' is set while 'spell' was already set.
170To minimize the delay each word list is only loaded once, it is not deleted
171when 'spelllang' is made empty or 'spell' is reset. When 'encoding' is set
172all the word lists are reloaded, thus you may notice a delay then too.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000173
174
Bram Moolenaar217ad922005-03-20 22:37:15 +0000175REGIONS
176
177A word may be spelled differently in various regions. For example, English
178comes in (at least) these variants:
179
180 en all regions
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000181 en_au Australia
Bram Moolenaar217ad922005-03-20 22:37:15 +0000182 en_ca Canada
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000183 en_gb Great Britain
184 en_nz New Zealand
185 en_us USA
Bram Moolenaar217ad922005-03-20 22:37:15 +0000186
187Words that are not used in one region but are used in another region are
Bram Moolenaar520470a2005-06-16 21:59:56 +0000188highlighted with SpellLocal |hl-SpellLocal|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000189
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000190Always use lowercase letters for the language and region names.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000191
Bram Moolenaar3638c682005-06-08 22:05:14 +0000192When adding a word with |zg| or another command it's always added for all
193regions. You can change that by manually editing the 'spellfile'. See
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000194|spell-wordlist-format|. Note that the regions as specified in the files in
195'spellfile' are only used when all entries in "spelllang" specify the same
196region (not counting files specified by their .spl name).
Bram Moolenaar3638c682005-06-08 22:05:14 +0000197
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000198Specific exception: For German these special regions are used:
199 de all German words accepted
200 de_de old and new spelling
201 de_19 old spelling
202 de_20 new spelling
203 de_at Austria
204 de_ch Switzerland
205
Bram Moolenaar217ad922005-03-20 22:37:15 +0000206
Bram Moolenaar3b506942005-06-23 22:36:45 +0000207SPELL FILES *spell-load*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000208
209Vim searches for spell files in the "spell" subdirectory of the directories in
Bram Moolenaar3638c682005-06-08 22:05:14 +0000210'runtimepath'. The name is: LL.EEE.spl, where:
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000211 LL the language name
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000212 EEE the value of 'encoding'
Bram Moolenaar217ad922005-03-20 22:37:15 +0000213
Bram Moolenaar3b506942005-06-23 22:36:45 +0000214The value for "LL" comes from 'spelllang', but excludes the region name.
215Examples:
216 'spelllang' LL ~
217 en_us en
218 en-rare en-rare
219 medical_ca medical
220
Bram Moolenaar3638c682005-06-08 22:05:14 +0000221Only the first file is loaded, the one that is first in 'runtimepath'. If
222this succeeds then additionally files with the name LL.EEE.add.spl are loaded.
223All the ones that are found are used.
224
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000225Additionally, the files related to the names in 'spellfile' are loaded. These
226are the files that |zg| and |zw| add good and wrong words to.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000227
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000228Exceptions:
229- Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
230 matter for spelling.
231- When no spell file for 'encoding' is found "ascii" is tried. This only
232 works for languages where nearly all words are ASCII, such as English. It
233 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
Bram Moolenaar3638c682005-06-08 22:05:14 +0000234 is being edited. For the ".add" files the same name as the found main
235 spell file is used.
236
237For example, with these values:
238 'runtimepath' is "~/.vim,/usr/share/vim70,~/.vim/after"
239 'encoding' is "iso-8859-2"
240 'spelllang' is "pl"
241
242Vim will look for:
2431. ~/.vim/spell/pl.iso-8859-2.spl
2442. /usr/share/vim70/spell/pl.iso-8859-2.spl
2453. ~/.vim/spell/pl.iso-8859-2.add.spl
2464. /usr/share/vim70/spell/pl.iso-8859-2.add.spl
2475. ~/.vim/after/spell/pl.iso-8859-2.add.spl
248
249This assumes 1. is not found and 2. is found.
250
251If 'encoding' is "latin1" Vim will look for:
2521. ~/.vim/spell/pl.latin1.spl
2532. /usr/share/vim70/spell/pl.latin1.spl
2543. ~/.vim/after/spell/pl.latin1.spl
2554. ~/.vim/spell/pl.ascii.spl
2565. /usr/share/vim70/spell/pl.ascii.spl
2576. ~/.vim/after/spell/pl.ascii.spl
258
259This assumes none of them are found (Polish doesn't make sense when leaving
260out the non-ASCII characters).
Bram Moolenaar217ad922005-03-20 22:37:15 +0000261
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000262Spelling for EBCDIC is currently not supported.
263
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000264A spell file might not be available in the current 'encoding'. See
265|spell-mkspell| about how to create a spell file. Converting a spell file
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000266with "iconv" will NOT work!
Bram Moolenaar217ad922005-03-20 22:37:15 +0000267
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000268 *E758* *E759*
269When loading a spell file Vim checks that it is properly formatted. If you
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000270get an error the file may be truncated, modified or intended for another Vim
271version.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000272
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000273
274WORDS
275
276Vim uses a fixed method to recognize a word. This is independent of
277'iskeyword', so that it also works in help files and for languages that
278include characters like '-' in 'iskeyword'. The word characters do depend on
279'encoding'.
280
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000281The table with word characters is stored in the main .spl file. Therefore it
282matters what the current locale is when generating it! A .add.spl file does
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000283not contain a word table though.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000284
Bram Moolenaar3638c682005-06-08 22:05:14 +0000285A word that starts with a digit is always ignored. That includes hex numbers
286in the form 0xff and 0XFF.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000287
288
Bram Moolenaar30abd282005-06-22 22:35:10 +0000289WORD COMBINATIONS
290
291It is possible to spell-check words that include a space. This is used to
292recognize words that are invalid when used by themselves, e.g. for "et al.".
293It can also be used to recognize "the the" and highlight it.
294
295The number of spaces is irrelevant. In most cases a line break may also
296appear. However, this makes it difficult to find out where to start checking
297for spelling mistakes. When you make a change to one line and only that line
298is redrawn Vim won't look in the previous line, thus when "et" is at the end
299of the previous line "al." will be flagged as an error. And when you type
300"the<CR>the" the highlighting doesn't appear until the first line is redrawn.
301Use |CTRL-L| to redraw right away. "[s" will also stop at a word combination
302with a line break.
303
304When encountering a line break Vim skips characters such as '*', '>' and '"',
305so that comments in C, shell and Vim code can be spell checked.
306
307
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +0000308SYNTAX HIGHLIGHTING *spell-syntax*
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000309
310Files that use syntax highlighting can specify where spell checking should be
311done:
312
Bram Moolenaar3638c682005-06-08 22:05:14 +00003131. everywhere default
3142. in specific items use "contains=@Spell"
3153. everywhere but specific items use "contains=@NoSpell"
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000316
Bram Moolenaar3638c682005-06-08 22:05:14 +0000317For the second method adding the @NoSpell cluster will disable spell checking
318again. This can be used, for example, to add @Spell to the comments of a
319program, and add @NoSpell for items that shouldn't be checked.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000320
Bram Moolenaar30abd282005-06-22 22:35:10 +0000321
322VIM SCRIPTS
323
324If you want to write a Vim script that does something with spelling, you may
325find these functions useful:
326
327 spellbadword() find badly spelled word at the cursor
328 spellsuggest() get list of spelling suggestions
Bram Moolenaard042c562005-06-30 22:04:15 +0000329 soundfold() get the sound-a-like version of a word
Bram Moolenaar30abd282005-06-22 22:35:10 +0000330
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000331
332SETTING 'spellcapcheck' AUTOMATICALLY *set-spc-auto*
333
334After the 'spelllang' option has been set successfully, Vim will source the
335files "spell/LANG.vim" in 'runtimepath'. "LANG" is the value of 'spelllang'
336up to the first comma, dot or underscore. This can be used to set options
337specifically for the language, especially 'spellcapcheck'.
338
339The distribution includes a few of these files. Use this command to see what
340they do: >
341 :next $VIMRUNTIME/spell/*.vim
342
343Note that the default scripts don't set 'spellcapcheck' if it was changed from
344the default value. This assumes the user prefers another value then.
345
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000346
347DOUBLE SCORING *spell-double-scoring*
348
349The 'spellsuggest' option can be used to select "double" scoring. This
350mechanism is based on the principle that there are two kinds of spelling
351mistakes:
352
3531. You know how to spell the word, but mistype something. This results in a
354 small editing distance (character swapped/omitted/inserted) and possibly a
355 word that sounds completely different.
356
3572. You don't know how to spell the word and type something that sounds right.
358 The edit distance can be big but the word is similar after sound-folding.
359
360Since scores for these two mistakes will be very different we use a list
361for each and mix them.
362
363The sound-folding is slow and people that know the language won't make the
364second kind of mistakes. Therefore 'spellsuggest' can be set to select the
365preferred method for scoring the suggestions.
366
Bram Moolenaar217ad922005-03-20 22:37:15 +0000367==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00003683. Generating a spell file *spell-mkspell*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000369
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000370Vim uses a binary file format for spelling. This greatly speeds up loading
371the word list and keeps it small.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000372 *.aff* *.dic* *Myspell*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000373You can create a Vim spell file from the .aff and .dic files that Myspell
374uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to
375find them here:
376 http://lingucomponent.openoffice.org/spell_dic.html
Bram Moolenaar30abd282005-06-22 22:35:10 +0000377You can also use a plain word list. The results are the same, the choice
Bram Moolenaard042c562005-06-30 22:04:15 +0000378depends on what word lists you can find.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000379
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000380If you install Aap (from www.a-a-p.org) you can use the recipes in the
381runtime/spell/??/ directories. Aap will take care of downloading the files,
382apply patches needed for Vim and build the .spl file.
383
Bram Moolenaare13305e2005-06-19 22:54:15 +0000384Make sure your current locale is set properly, otherwise Vim doesn't know what
385characters are upper/lower case letters. If the locale isn't available (e.g.,
386when using an MS-Windows codepage on Unix) add tables to the .aff file
Bram Moolenaar3b506942005-06-23 22:36:45 +0000387|spell-affix-chars|. If the .aff file doesn't define a table then the word
388table of the currently active spelling is used. If spelling is not active
389then Vim will try to guess.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000390
Bram Moolenaar3b506942005-06-23 22:36:45 +0000391 *:mksp* *:mkspell*
392:mksp[ell][!] [-ascii] {outname} {inname} ...
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000393 Generate a Vim spell file word lists. Example: >
Bram Moolenaard042c562005-06-30 22:04:15 +0000394 :mkspell /tmp/nl nl_NL.words
Bram Moolenaar3b506942005-06-23 22:36:45 +0000395< *E751*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000396 When {outname} ends in ".spl" it is used as the output
397 file name. Otherwise it should be a language name,
Bram Moolenaar3b506942005-06-23 22:36:45 +0000398 such as "en", without the region name. The file
399 written will be "{outname}.{encoding}.spl", where
400 {encoding} is the value of the 'encoding' option.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000401
Bram Moolenaard042c562005-06-30 22:04:15 +0000402 When the output file already exists [!] must be used
Bram Moolenaar520470a2005-06-16 21:59:56 +0000403 to overwrite it.
404
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000405 When the [-ascii] argument is present, words with
406 non-ascii characters are skipped. The resulting file
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000407 ends in "ascii.spl".
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000408
409 The input can be the Myspell format files {inname}.aff
410 and {inname}.dic. If {inname}.aff does not exist then
411 {inname} is used as the file name of a plain word
412 list.
413
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000414 Multiple {inname} arguments can be given to combine
415 regions into one Vim spell file. Example: >
416 :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
417< This combines the English word lists for US, CA and AU
418 into one en.spl file.
419 Up to eight regions can be combined. *E754* *755*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000420 The REP and SAL items of the first .aff file where
421 they appear are used. |spell-affix-REP|
422 |spell-affix-SAL|
Bram Moolenaar217ad922005-03-20 22:37:15 +0000423
Bram Moolenaar30abd282005-06-22 22:35:10 +0000424 This command uses a lot of memory, required to find
425 the optimal word tree (Polish requires a few hundred
426 Mbyte). The final result will be much smaller.
427
Bram Moolenaard042c562005-06-30 22:04:15 +0000428 After the spell file was written and it was being used
429 in a buffer it will be reloaded automatically.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000430
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000431:mksp[ell] [-ascii] {name}.{enc}.add
432 Like ":mkspell" above, using {name}.{enc}.add as the
Bram Moolenaard042c562005-06-30 22:04:15 +0000433 input file and producing an output file in the same
434 directory that has ".spl" appended.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000435
436:mksp[ell] [-ascii] {name}
437 Like ":mkspell" above, using {name} as the input file
Bram Moolenaard042c562005-06-30 22:04:15 +0000438 and producing an output file in the same directory
439 that has ".{enc}.spl" appended.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000440
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000441Vim will report the number of duplicate words. This might be a mistake in the
442list of words. But sometimes it is used to have different prefixes and
443suffixes for the same basic word to avoid them combining (e.g. Czech uses
444this).
445
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000446Since you might want to change a Myspell word list for use with Vim the
447following procedure is recommended:
Bram Moolenaar217ad922005-03-20 22:37:15 +0000448
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00004491. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
4502. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
4513. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000452 words, define word characters with FOL/LOW/UPP, etc. The distributed
453 "src/spell/*.diff" files can be used.
Bram Moolenaard042c562005-06-30 22:04:15 +00004544. Start Vim with the right locale and use |:mkspell| to generate the Vim
455 spell file.
4565. Try out the spell file with ":set spell spelllang=xx" if you wrote it in
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000457 a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you
Bram Moolenaard042c562005-06-30 22:04:15 +0000458 wrote it somewhere else.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000459
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000460When the Myspell files are updated you can merge the differences:
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004611. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
4622. Use Vimdiff to see what changed: >
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000463 vimdiff xx_YY.orig.dic xx_YY.new.dic
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004643. Take over the changes you like in xx_YY.dic.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000465 You may also need to change xx_YY.aff.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004664. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000467
Bram Moolenaar3b506942005-06-23 22:36:45 +0000468
469SPELL FILE DUMP
470
471If for some reason you want to check what words are supported by the currently
472used spelling files, use this command:
473
474 *:spelldump* *:spelld*
475:spelld[ump] Open a new window and fill it with all currently valid
476 words.
Bram Moolenaard042c562005-06-30 22:04:15 +0000477 Note: For some languages the result may be enormous,
478 causing Vim to run out of memory.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000479
480The format of the word list is used |spell-wordlist-format|. You should be
481able to read it with ":mkspell" to generate one .spl file that includes all
482the words.
483
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000484When all entries to 'spelllang' use the same regions or no regions at all then
485the region information is included in the dumped words. Otherwise only words
486for the current region are included and no "/regions" line is generated.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000487
Bram Moolenaard042c562005-06-30 22:04:15 +0000488Comment lines with the name of the .spl file are used as a header above the
489words that were generated from that .spl file.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000490
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000491==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00004924. Spell file format *spell-file-format*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000493
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000494This is the format of the files that are used by the person who creates and
495maintains a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000496
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000497Note that we avoid the word "dictionary" here. That is because the goal of
498spell checking differs from writing a dictionary (as in the book). For
Bram Moolenaard042c562005-06-30 22:04:15 +0000499spelling we need a list of words that are OK, thus should not to be
500highlighted. Person and company names will not appear in a dictionary, but do
501appear in a word list. And some old words are rarely used while they are
502common misspellings. These do appear in a dictionary but not in a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000503
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000504There are two formats: A straight list of words and a list using affix
Bram Moolenaard042c562005-06-30 22:04:15 +0000505compression. The files with affix compression are used by Myspell (Mozilla
506and OpenOffice.org). This requires two files, one with .aff and one with .dic
507extension.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000508
509
Bram Moolenaard042c562005-06-30 22:04:15 +0000510FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format*
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000511
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000512The words must appear one per line. That is all that is required.
Bram Moolenaard042c562005-06-30 22:04:15 +0000513
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000514Additionally the following items are recognized:
Bram Moolenaard042c562005-06-30 22:04:15 +0000515
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000516- Empty and blank lines are ignored.
Bram Moolenaard042c562005-06-30 22:04:15 +0000517
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000518- Lines starting with a # are ignored (comment lines).
Bram Moolenaard042c562005-06-30 22:04:15 +0000519
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000520- A line starting with "/encoding=", before any word, specifies the encoding
521 of the file. After the second '=' comes an encoding name. This tells Vim
Bram Moolenaard042c562005-06-30 22:04:15 +0000522 to setup conversion from the specified encoding to 'encoding'. Thus you can
523 use one word list for several target encodings.
524
Bram Moolenaar3638c682005-06-08 22:05:14 +0000525- A line starting with "/regions=" specifies the region names that are
526 supported. Each region name must be two ASCII letters. The first one is
527 region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca".
Bram Moolenaard042c562005-06-30 22:04:15 +0000528 In an addition word list the region names should be equal to the main word
529 list!
530
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000531- Other lines starting with '/' are reserved for future use. The ones that
532 are not recognized are ignored (but you do get a warning message).
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000533
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000534- A "/" may follow the word with the following items:
535 = Case must match exactly.
536 ? Rare word.
537 ! Bad (wrong) word.
538 digit A region in which the word is valid. If no regions are
539 specified the word is valid in all regions.
540
Bram Moolenaar3638c682005-06-08 22:05:14 +0000541Example:
542
543 # This is an example word list comment
544 /encoding=latin1 encoding of the file
545 /regions=uscagb regions "us", "ca" and "gb"
546 example word for all regions
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000547 blah/12 word for regions "us" and "ca"
548 vim/! bad word
549 Campbell/?3 rare word in region 3 "gb"
550 's mornings/= keep-case word
Bram Moolenaar3638c682005-06-08 22:05:14 +0000551
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000552Note that when "/=" is used the same word with all upper-case letters is not
553accepted. This is different from a word with mixed case that is automatically
554marked as keep-case, those words may appear in all upper-case letters.
555
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000556
557FORMAT WITH AFFIX COMPRESSION
558
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000559There are two files: the basic word list and an affix file. The affixes are
560used to modify the basic words to get the full word list. This significantly
561reduces the number of words, especially for a language like Polish. This is
562called affix compression.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000563
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000564The format for the affix and word list files is mostly identical to what
565Myspell uses (the spell checker of Mozilla and OpenOffice.org). A description
566can be found here:
567 http://lingucomponent.openoffice.org/affix.readme ~
568Note that affixes are case sensitive, this isn't obvious from the description.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000569
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000570Vim supports a few extras. Hopefully Myspell will support these too some day.
571See |spell-affix-vim|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000572
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000573The basic word list and the affix file are combined and turned into a binary
574spell file. All the preprocessing has been done, thus this file loads fast.
575The binary spell file format is described in the source code (src/spell.c).
576But only developers need to know about it.
577
578The preprocessing also allows us to take the Myspell language files and modify
579them before the Vim word list is made. The tools for this can be found in the
580"src/spell" directory.
581
582
Bram Moolenaar3638c682005-06-08 22:05:14 +0000583WORD LIST FORMAT *spell-dic-format*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000584
585A very short example, with line numbers:
586
587 1 1234
588 2 aan
589 3 Als
590 4 Etten-Leur
591 5 et al.
592 6 's-Gravenhage
593 7 's-Gravenhaags
594 8 bedel/P
595 9 kado/1
596 10 cadeau/2
597
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000598The first line contains the number of words. Vim ignores it, but you do get
599an error message if it's not there. *E760*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000600
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000601What follows is one word per line. There should be no white space before or
602after the word.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000603
604When the word only has lower-case letters it will also match with the word
605starting with an upper-case letter.
606
607When the word includes an upper-case letter, this means the upper-case letter
608is required at this position. The same word with a lower-case letter at this
609position will not match. When some of the other letters are upper-case it will
610not match either.
611
Bram Moolenaard042c562005-06-30 22:04:15 +0000612The word with all upper-case characters will always be OK.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000613
614 word list matches does not match ~
615 als als Als ALS ALs AlS aLs aLS
616 Als Als ALS als ALs AlS aLs aLS
617 ALS ALS als Als ALs AlS aLs aLS
618 AlS AlS ALS als Als ALs aLs aLS
619
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000620The KEP affix ID can be used to specifically match a word with identical case
Bram Moolenaare7566042005-06-17 22:00:15 +0000621only, see below |spell-affix-KEP|.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000622
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000623Note in line 5 to 7 that non-word characters are used. You can include
624any character in a word. When checking the text a word still only matches
625when it appears with a non-word character before and after it. For Myspell a
626word starting with a non-word character probably won't work.
627
628After the word there is an optional slash and flags. Most of these flags are
Bram Moolenaard042c562005-06-30 22:04:15 +0000629letters that indicate the affixes that can be used with this word. These are
630specified with SFX and PFX lines in the .aff file. See the Myspell
631documentation.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000632
633 *spell-affix-vim*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000634A flag that Vim adds and is not in Myspell is the flag defined with KEP in the
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000635affix file. This has the meaning that case matters. This can be used if the
636word does not have the first letter in upper case at the start of a sentence.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000637Example (assuming that = was used for KEP):
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000638
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000639 word list matches does not match ~
640 's morgens/= 's morgens 'S morgens 's Morgens 'S MORGENS
641 's Morgens 's Morgens 'S MORGENS 'S morgens 's morgens
642
643The flag can also be used to avoid that the word matches when it is in all
644upper-case letters.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000645
646 *spell-affix-mbyte*
647The basic word list is normally in an 8-bit encoding, which is mentioned in
648the affix file. The affix file must always be in the same encoding as the
649word list. This is compatible with Myspell. For Vim the encoding may also be
650something else, any encoding that "iconv" supports. The "SET" line must
651specify the name of the encoding. When using a multi-byte encoding it's
Bram Moolenaard042c562005-06-30 22:04:15 +0000652possible to use more different affixes (but Myspell doesn't support that, thus
653you may not want to use it anyway).
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000654
Bram Moolenaare13305e2005-06-19 22:54:15 +0000655
656CHARACTER TABLES
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000657 *spell-affix-chars*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000658When using an 8-bit encoding the affix file should define what characters are
659word characters (as specified with ENC). This is because the system where
660":mkspell" is used may not support a locale with this encoding and isalpha()
661won't work. For example when using "cp1250" on Unix.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000662
Bram Moolenaare7566042005-06-17 22:00:15 +0000663 *E761* *E762* *spell-affix-FOL*
664 *spell-affix-LOW* *spell-affix-UPP*
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000665Three lines in the affix file are needed. Simplistic example:
666
Bram Moolenaare13305e2005-06-19 22:54:15 +0000667 FOL áëñ ~
668 LOW áëñ ~
669 UPP ÁËÑ ~
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000670
671All three lines must have exactly the same number of characters.
672
673The "FOL" line specifies the case-folded characters. These are used to
674compare words while ignoring case. For most encodings this is identical to
675the lower case line.
676
677The "LOW" line specifies the characters in lower-case. Mostly it's equal to
678the "FOL" line.
679
680The "UPP" line specifies the characters with upper-case. That is, a character
681is upper-case where it's different from the character at the same position in
682"FOL".
683
684ASCII characters should be omitted, Vim always handles these in the same way.
685When the encoding is UTF-8 no word characters need to be specified.
686
687 *E763*
Bram Moolenaar3b506942005-06-23 22:36:45 +0000688Vim allows you to use spell checking for several languages in the same file.
689You can list them in the 'spelllang' option. As a consequence all spell files
690for the same encoding must use the same word characters, otherwise they can't
691be combined without errors. If you get a warning that the word tables differ
692you may need to generate the .spl file again with |:mkspell|. Check the FOL,
693LOW and UPP lines in the used .aff file.
694
695The XX.ascii.spl spell file generated with the "-ascii" argument will not
696contain the table with characters, so that it can be combine with spell files
697for any encoding. The .add.spl files also do not contain the table.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000698
Bram Moolenaare7566042005-06-17 22:00:15 +0000699
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000700MID-WORD CHARACTERS
701 *spell-midword*
702Some characters are only to be considered word characters if they are used in
703between two ordinary word characters. An example is the single quote: It is
704often used to put text in quotes, thus it can't be recognized as a word
705character, but when it appears in between word characters it must be part of
706the word. This is needed to detect a spelling error such as they'are. That
707should be they're, but since "they" and "are" are words themselves that would
708go unnoticed.
709
710These characters are defined with MIDWORD in the .aff file:
711
712 MIDWORD '- ~
713
714
Bram Moolenaare13305e2005-06-19 22:54:15 +0000715AFFIXES
716 *spell-affix-PFX* *spell-affix-SFX*
717The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000718documentation or the Aspell manual:
719http://aspell.net/man-html/Affix-Compression.html).
Bram Moolenaare13305e2005-06-19 22:54:15 +0000720
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000721Note that Myspell ignores any extra text after the relevant info. Vim
722requires this text to start with a "#" so that mistakes don't go unnoticed.
723Example:
724
725 SFX F 0 in [^i]n # Spion > Spionin ~
726 SFX F 0 nen in # Bauerin > Bauerinnen ~
727
728An extra item for Vim is the "rare" flag. It must come after the other
729fields, before a comment. When used then all words that use the affix will be
730marked as rare words. Example:
731
732 PFX F 0 nene . rare ~
733 SFX F 0 oin n rare # hardly ever used ~
734
735However, if the word also appears as a good word in another way it won't be
736marked as rare.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000737
738 *spell-affix-PFXPOSTPONE*
739When an affix file has very many prefixes that apply to many words it's not
740possible to build the whole word list in memory. This applies to Hebrew (a
741list with all words is over a Gbyte). In that case applying prefixes must be
742postponed. This makes spell checking slower. It is indicated by this keyword
743in the .aff file:
744
745 PFXPOSTPONE ~
746
747Only prefixes without a chop string can be postponed, prefixes with a chop
Bram Moolenaar78984f52005-08-01 07:19:10 +0000748string will still be included in the word list. An exception if the chop
749string is one character and equal to the last character of the added string,
750but in lower case. Thus when the chop string is used to allow the following
751word to start with an upper case letter.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000752
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000753It is not possible to use PFXPOSTPONE together with COMPOUNDFLAG or
754COMPOUNDFLAGS.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000755
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000756
757WORDS WITH A SLASH *spell-affix-SLASH*
758
759The slash is used in the .dic file to separate the basic word from the affix
760letters that can be used. Unfortunately, this means you cannot use a slash in
761a word. Thus "TCP/IP" cannot be a word. To work around that you can define a
762replacement character for the slash. Example:
763
764 SLASH , ~
765
766Now you can use "TCP,IP" to add the word "TCP/IP".
767
768Of course, the letter used should itself not appear in any word! The letter
769must be ASCII, thus a single byte.
770
771
772KEEP-CASE WORDS *spell-affix-KEP*
773
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000774In the affix file a KEP line can be used to define the affix name used for
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000775keep-case words. Example:
776
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000777 KEP = ~
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000778
779See above for an example |spell-affix-vim|.
780
Bram Moolenaare13305e2005-06-19 22:54:15 +0000781
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000782RARE WORDS *spell-affix-RAR*
783
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000784In the affix file a RAR line can be used to define the affix name used for
785rare words. Example:
786
787 RAR ? ~
788
789Rare words are highlighted differently from bad words. This is to be used for
790words that are correct for the language, but are hardly ever used and could be
Bram Moolenaar30abd282005-06-22 22:35:10 +0000791a typing mistake anyway. When the same word is found as good it won't be
792highlighted as rare.
793
794
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000795BAD WORDS *spell-affix-BAD*
796
Bram Moolenaar30abd282005-06-22 22:35:10 +0000797In the affix file a BAD line can be used to define the affix name used for
798bad words. Example:
799
800 BAD ! ~
801
802This can be used to exclude words that would otherwise be good. For example
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000803"the the" in the .dic file:
804
805 the the/! ~
806
807Once a word has been marked as bad it won't be undone by encountering the same
808word as good.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000809
810
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000811COMPOUND WORDS *spell-affix-compound*
812
813A compound word is a longer word made by concatenating words. To specify
814which words may be concatenated a character is used. This character is put in
815the list of affixes after the word. We will call this character a flag here.
816Obviously these flags must be different from any affix IDs used.
817
818 *spell-COMPOUNDFLAG*
819The Myspell compatible method uses one flag, specified with COMPOUNDFLAG.
820All words with this flag combine in any order and without limit in length.
821This means there is no control over which word comes first. Example:
822 COMPOUNDFLAG c ~
823
824 *spell-COMPOUNDFLAGS*
825The method added by Vim allows specifying which words can be prepended to
826other words, and which words can be appended to other words. This is a list
827of comma separated items. Each item may contain zero or more dashes and plus
828signs.
829
830NOTE: At this moment COMPOUNDFLAGS has not been implemented yet!
831
832An item without dashes specifies words that combine in any order and as often
833as possible. Example:
834 COMPOUNDFLAGS c,m ~
835
836This allows all words with the "c" flag to be combined and all words with the
837"m" flag to be combined, but a word with the "c" flag doesn't combine with a
838word with the "m" flag.
839
840Flags that are put together, without a separating comma, are considered
841interchangable. Example:
842 COMPOUNDFLAGS cm ~
843
844This allows all words with the "c" and/or "m" flag to be combined.
845
846An item with one dash specifies flags for a leading word and flags for a
847trailing word. Thus only two-word combinations are made. Example:
848 COMPOUNDFLAGS f-d ~
849
850Here the 'f' flag can be used for food and 'd' for dishes, such that you can
851use these words in the dictionary:
852 tomato/f ~
853 onion/f~
854 soup/d~
855 salat/d~
856
857Which makes the words:
858 tomato
859 onion
860 soup
861 salat
862 tomatosoup
863 tomatosalat
864 onionsoup
865 onionsalat
866
867Note that something like "souptomato" is not possible. And that it's actually
868easier to list all the words if you have only this few.
869
870More dashes can be used to allow more words to combine. For example:
871 COMPOUNDFLAGS f-d,f-f-d ~
872
873Would allow "tomatoonionsoup" (OK, so this is a bad example, but you get the
874idea).
875
876When a word can be used an undetermined number of times use a plus instead of
877a dash. Example:
878 COMPOUNDFLAGS f+d ~
879
880Then you can make tasty "oniononiontomatotomatosoup".
881
882The "+" may also appear at the end, which means that the last flags can be
883repeated many times. Example:
884 COMPOUNDFLAGS f-d+ ~
885
886Which allows the use of "onionsoupsoupsoupsoupsoupsoup".
887
888 *spell-COMPOUNDMIN*
889The minimal length of a word used for concatenation is specified with
890COMPOUNDMIN. Example:
891 COMPOUNDMIN 5 ~
892
893When omitted a minimal length of 3 bytes is used. Obviously you could just
894leave out the compound flag from short words instead, this feature is present
895for compatibility with Myspell.
896
897 *spell-CMP*
898NOTE: At this moment CMP has not been implemented yet!
899
900Sometimes it is necessary to change a word when concatenating it to another,
901by removing a few letters, inserting something or both. It can also be useful
902to restrict concatenation to words that match a pattern. For this purpose CMP
903items can be used. They look like this:
904 CMP {flag} {strip} {add} {cond} {cond2}
905
906 {flag} the flag, as used in COMPOUNDFLAGS for the lead word
907 {strip} text to remove from the end of the lead word (zero
908 for no stripping)
909 {add} text to insert between the words (zero for no
910 addition)
911 {cond} condition to match at the end of the lead word
912 {cond2} condition to match at the start of the following word
913
914This is exactly the same as what is used for SFX and PFX items, except there
915is an extra condition. Example:
916 CMP f 0 - . . ~
917
918When used with the food and dish word list above, this means that a dash is
919inserted after each food item. Thus you get "onion-soup" and
920"onion-tomato-salat".
921
922When there are CMP items for a compound flag the concatenation is only done
923when a CMP item matches.
924
925When there are no CMP items for a compound flag, then all words will be
926concatenated, as if there was an item:
927 CMP {flag} 0 0 . .
928
929
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000930REPLACEMENTS *spell-affix-REP*
931
932In the affix file REP items can be used to define common mistakes. This is
933used to make spelling suggestions. The items define the "from" text and the
934"to" replacement. Example:
935
936 REP 4 ~
937 REP f ph ~
938 REP ph f ~
939 REP k ch ~
940 REP ch k ~
941
942The first line specifies the number of REP lines following. Vim ignores it.
Bram Moolenaard042c562005-06-30 22:04:15 +0000943Don't include simple one-character replacements or swaps. Vim will try these
944anyway. You can include whole words if you want to, but you might want to use
945the "file:" item in 'spellsuggest' instead.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000946
947
948SIMILAR CHARACTERS *spell-affix-MAP*
949
Bram Moolenaard042c562005-06-30 22:04:15 +0000950In the affix file MAP items can be used to define letters that are very much
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000951alike. This is mostly used for a letter with different accents. This is used
952to prefer suggestions with these letters substituted. Example:
953
954 MAP 2 ~
955 MAP eéëêè ~
956 MAP uüùúû ~
957
958The first line specifies the number of MAP lines following. Vim ignores it.
959
Bram Moolenaard042c562005-06-30 22:04:15 +0000960Each letter must appear in only one of the MAP items. It's a bit more
961efficient if the first letter is ASCII or at least one without accents.
Bram Moolenaare7566042005-06-17 22:00:15 +0000962
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000963
Bram Moolenaard042c562005-06-30 22:04:15 +0000964SOUND-A-LIKE *spell-affix-SAL*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000965
966In the affix file SAL items can be used to define the sounds-a-like mechanism
967to be used. The main items define the "from" text and the "to" replacement.
Bram Moolenaard042c562005-06-30 22:04:15 +0000968Simplistic example:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000969
970 SAL CIA X ~
971 SAL CH X ~
972 SAL C K ~
973 SAL K K ~
974
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000975There are a few rules and this can become quite complicated. An explanation
Bram Moolenaard042c562005-06-30 22:04:15 +0000976how it works can be found in the Aspell manual:
Bram Moolenaar42eeac32005-06-29 22:40:58 +0000977http://aspell.net/man-html/Phonetic-Code.html.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000978
979There are a few special items:
980
981 SAL followup true ~
982 SAL collapse_result true ~
983 SAL remove_accents true ~
984
985"1" has the same meaning as "true". Any other value means "false".
986
Bram Moolenaar42eeac32005-06-29 22:40:58 +0000987
988SIMPLE SOUNDFOLDING *spell-affix-SOFOFROM* *spell-affix-SOFOTO*
989
990The SAL mechanism is complex and slow. A simpler mechanism is mapping all
991characters to another character, mapping similar sounding characters to the
992same character. At the same time this does case folding. You can not have
Bram Moolenaard042c562005-06-30 22:04:15 +0000993both SAL items and simple soundfolding.
Bram Moolenaar42eeac32005-06-29 22:40:58 +0000994
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000995There are two items required: one to specify the characters that are mapped
Bram Moolenaar42eeac32005-06-29 22:40:58 +0000996and one that specifies the characters they are mapped to. They must have
997exactly the same number of characters. Example:
998
999 SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~
1000 SOFOTO ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~
1001
1002In the example all vowels are mapped to the same character 'e'. Another
Bram Moolenaard042c562005-06-30 22:04:15 +00001003method would be to leave out all vowels. Some characters that sound nearly
1004the same and are often mixed up, such as 'm' and 'n', are mapped to the same
1005character. Don't do this too much, all words will start looking alike.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001006
1007Characters that do not appear in SOFOFROM will be left out, except that all
1008white space is replaced by one space. Sequences of the same character in
1009SOFOFROM are replaced by one.
1010
1011You can use the |soundfold()| function to try out the results. Or set the
1012'verbose' option to see the score in the output of the |z?| command.
1013
1014
Bram Moolenaar217ad922005-03-20 22:37:15 +00001015 vim:tw=78:sw=4:ts=8:ft=help:norl: