blob: 929c1060a02ccf3749f10839a40e15723f919762 [file] [log] [blame]
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +00001*spell.txt* For Vim version 7.0aa. Last change: 2005 Aug 16
Bram Moolenaar217ad922005-03-20 22:37:15 +00002
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7Spell checking *spell*
8
91. Quick start |spell-quickstart|
Bram Moolenaard042c562005-06-30 22:04:15 +0000102. Remarks on spell checking |spell-remarks|
113. Generating a spell file |spell-mkspell|
124. Spell file format |spell-file-format|
Bram Moolenaar217ad922005-03-20 22:37:15 +000013
14{Vi does not have any of these commands}
15
16Spell checking is not available when the |+syntax| feature has been disabled
17at compile time.
18
19==============================================================================
201. Quick start *spell-quickstart*
21
22This command switches on spell checking: >
23
24 :setlocal spell spelllang=en_us
25
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000026This switches on the 'spell' option and specifies to check for US English.
Bram Moolenaar217ad922005-03-20 22:37:15 +000027
28The words that are not recognized are highlighted with one of these:
Bram Moolenaar520470a2005-06-16 21:59:56 +000029 SpellBad word not recognized |hl-SpellBad|
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +000030 SpellCap word not capitalised |hl-SpellCap|
Bram Moolenaar520470a2005-06-16 21:59:56 +000031 SpellRare rare word |hl-SpellRare|
32 SpellLocal wrong spelling for selected region |hl-SpellLocal|
Bram Moolenaar217ad922005-03-20 22:37:15 +000033
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000034Vim only checks words for spelling, there is no grammar check.
35
Bram Moolenaar45360022005-07-21 21:08:21 +000036If the 'mousemodel' option is set to "popup" and the cursor is on a badly
37spelled word or it is "popup_setpos" and the mouse pointer is on a badly
38spelled word, then the popup menu will contain an submenu to replace the bad
39word. Note: this slows down the appearance of the popup menu.
40
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000041To search for the next misspelled word:
42
43 *]s* *E756*
44]s Move to next misspelled word after the cursor.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000045 A count before the command can be used to repeat.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000046
47 *[s*
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000048[s Like "]s" but search backwards, find the misspelled
Bram Moolenaar30abd282005-06-22 22:35:10 +000049 word before the cursor. Doesn't recognize words
50 split over two lines, thus may stop at words that are
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000051 not highlighted as bad. Does not stop at word with
52 missing capital at the start of a line.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000053
54 *]S*
55]S Like "]s" but only stop at bad words, not at rare
56 words or words for another region.
57
58 *[S*
59[S Like "]S" but search backwards.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000060
Bram Moolenaar217ad922005-03-20 22:37:15 +000061
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +000062To add words to your own word list: *E764*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000063
64 *zg*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000065zg Add word under the cursor as a good word to the first
66 name in 'spellfile'. In Visual mode the selected
67 characters are added as a word (including white
68 space!). If the word is explicitly marked as bad word
69 in another spell file the result is unpredictable.
70 A count may precede the command to indicate the entry
71 in 'spellfile' to be used. A count of two uses the
72 second entry.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000073
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000074 *zG*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000075zG Like "zg" but add the word to the internal word list
76 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000077
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000078 *zw*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000079zw Like "zg" but mark the word as a wrong (bad) word.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000080
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000081 *zW*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000082zW Like "zw" but add the word to the internal word list
83 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000084
Bram Moolenaar520470a2005-06-16 21:59:56 +000085 *:spe* *:spellgood*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000086:[count]spe[llgood] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000087 Add {word} as a good word to 'spellfile', like with
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000088 "zg". Without count the first name is used, with a
89 count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000090
Bram Moolenaar53180ce2005-07-05 21:48:14 +000091:spe[llgood]! {word} Add {word} as a good word to the internal word list,
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000092 like with "zG".
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000093
Bram Moolenaar520470a2005-06-16 21:59:56 +000094 *:spellw* *:spellwrong*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000095:[count]spellw[rong] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000096 Add {word} as a wrong (bad) word to 'spellfile', as
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000097 with "zw". Without count the first name is used, with
98 a count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000099
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000100:spellw[rong]! {word} Add {word} as a wrong (bad) word to the internal word
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000101 list.
102
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000103After adding a word to 'spellfile' with the above commands its associated
Bram Moolenaard042c562005-06-30 22:04:15 +0000104".spl" file will automatically be updated and reloaded. If you change
105'spellfile' manually you need to use the |:mkspell| command. This sequence of
106commands mostly works well: >
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000107 :edit <file in 'spellfile'>
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000108< (make changes to the spell file) >
109 :mkspell! %
110
111More details about the 'spellfile' format below |spell-wordlist-format|.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000112
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000113 *internal-wordlist*
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000114The internal word list is used for all buffers where 'spell' is set. It is
115not stored, it is lost when you exit Vim. It is also cleared when 'encoding'
116is set.
117
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000118
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000119Finding suggestions for bad words:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000120 *z?*
Bram Moolenaar30abd282005-06-22 22:35:10 +0000121z? For the word under/after the cursor suggest correctly
Bram Moolenaard042c562005-06-30 22:04:15 +0000122 spelled words. This also works to find alternatives
123 for a word that is not highlighted as a bad word,
124 e.g., when the word after it is bad.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000125 The results are sorted on similarity to the word
126 under/after the cursor.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000127 This may take a long time. Hit CTRL-C when you are
128 bored.
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000129 This does not work when there is a line break halfway
130 a bad word (e.g., "the the").
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000131 You can enter the number of your choice or press
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000132 <Enter> if you don't want to replace. You can also
133 use the mouse to click on your choice (only works if
134 the mouse can be used in Normal mode and when there
Bram Moolenaard042c562005-06-30 22:04:15 +0000135 are no line wraps). Click on the first (header) line
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000136 to cancel.
Bram Moolenaarf3bd51a2005-06-14 22:11:18 +0000137 If 'verbose' is non-zero a score will be displayed to
138 indicate the likeliness to the badly spelled word (the
139 higher the score the more different).
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000140 When a word was replaced the redo command "." will
141 repeat the word replacement. This works like "ciw",
142 the good word and <Esc>.
143
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000144 *:spellr* *:spellrepall* *E752* *E753*
145:spellr[epall] Repeat the replacement done by |z?| for all matches
146 with the replaced word in the current window.
147
Bram Moolenaar488c6512005-08-11 20:09:58 +0000148In Insert mode, when the cursor is after a badly spelled word, you can use
149CTRL-X s to find suggestions. This works like Insert mode completion. Use
150CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s|
151
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000152The 'spellsuggest' option influences how the list of suggestions is generated
153and sorted. See |'spellsuggest'|.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000154
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000155The 'spellcapcheck' option is used to check the first word of a sentence
156starts with a capital. This doesn't work for the first word in the file.
157When there is a line break right after a sentence the highlighting of the next
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000158line may be postponed. Use |CTRL-L| when needed. Also see |set-spc-auto| for
159how it can be set automatically when 'spelllang' is set.
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000160
Bram Moolenaard042c562005-06-30 22:04:15 +0000161==============================================================================
1622. Remarks on spell checking *spell-remarks*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000163
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000164PERFORMANCE
165
Bram Moolenaard042c562005-06-30 22:04:15 +0000166Vim does on-the-fly spell checking. To make this work fast the word list is
167loaded in memory. Thus this uses a lot of memory (1 Mbyte or more). There
168might also be a noticeable delay when the word list is loaded, which happens
169when 'spell' is set and when 'spelllang' is set while 'spell' was already set.
170To minimize the delay each word list is only loaded once, it is not deleted
171when 'spelllang' is made empty or 'spell' is reset. When 'encoding' is set
172all the word lists are reloaded, thus you may notice a delay then too.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000173
174
Bram Moolenaar217ad922005-03-20 22:37:15 +0000175REGIONS
176
177A word may be spelled differently in various regions. For example, English
178comes in (at least) these variants:
179
180 en all regions
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000181 en_au Australia
Bram Moolenaar217ad922005-03-20 22:37:15 +0000182 en_ca Canada
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000183 en_gb Great Britain
184 en_nz New Zealand
185 en_us USA
Bram Moolenaar217ad922005-03-20 22:37:15 +0000186
187Words that are not used in one region but are used in another region are
Bram Moolenaar520470a2005-06-16 21:59:56 +0000188highlighted with SpellLocal |hl-SpellLocal|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000189
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000190Always use lowercase letters for the language and region names.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000191
Bram Moolenaar3638c682005-06-08 22:05:14 +0000192When adding a word with |zg| or another command it's always added for all
193regions. You can change that by manually editing the 'spellfile'. See
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000194|spell-wordlist-format|. Note that the regions as specified in the files in
195'spellfile' are only used when all entries in "spelllang" specify the same
196region (not counting files specified by their .spl name).
Bram Moolenaar3638c682005-06-08 22:05:14 +0000197
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000198 *spell-german*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000199Specific exception: For German these special regions are used:
200 de all German words accepted
201 de_de old and new spelling
202 de_19 old spelling
203 de_20 new spelling
204 de_at Austria
205 de_ch Switzerland
206
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000207 *spell-yiddish*
208Yiddish requires using "utf-8" encoding, because of the special characters
209used. If you are using latin1 Vim will use transliterated (romanized) Yiddish
210instead. If you want to use transliterated Yiddish with utf-8 use "yi-tr".
211In a table:
212 'encoding' 'spelllang'
213 utf-8 yi Yiddish
214 latin1 yi transliterated Yiddish
215 utf-8 yi-tr transliterated Yiddish
216
Bram Moolenaar217ad922005-03-20 22:37:15 +0000217
Bram Moolenaar3b506942005-06-23 22:36:45 +0000218SPELL FILES *spell-load*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000219
220Vim searches for spell files in the "spell" subdirectory of the directories in
Bram Moolenaar3638c682005-06-08 22:05:14 +0000221'runtimepath'. The name is: LL.EEE.spl, where:
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000222 LL the language name
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000223 EEE the value of 'encoding'
Bram Moolenaar217ad922005-03-20 22:37:15 +0000224
Bram Moolenaar3b506942005-06-23 22:36:45 +0000225The value for "LL" comes from 'spelllang', but excludes the region name.
226Examples:
227 'spelllang' LL ~
228 en_us en
229 en-rare en-rare
230 medical_ca medical
231
Bram Moolenaar3638c682005-06-08 22:05:14 +0000232Only the first file is loaded, the one that is first in 'runtimepath'. If
233this succeeds then additionally files with the name LL.EEE.add.spl are loaded.
234All the ones that are found are used.
235
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000236Additionally, the files related to the names in 'spellfile' are loaded. These
237are the files that |zg| and |zw| add good and wrong words to.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000238
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000239Exceptions:
240- Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
241 matter for spelling.
242- When no spell file for 'encoding' is found "ascii" is tried. This only
243 works for languages where nearly all words are ASCII, such as English. It
244 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
Bram Moolenaar3638c682005-06-08 22:05:14 +0000245 is being edited. For the ".add" files the same name as the found main
246 spell file is used.
247
248For example, with these values:
249 'runtimepath' is "~/.vim,/usr/share/vim70,~/.vim/after"
250 'encoding' is "iso-8859-2"
251 'spelllang' is "pl"
252
253Vim will look for:
2541. ~/.vim/spell/pl.iso-8859-2.spl
2552. /usr/share/vim70/spell/pl.iso-8859-2.spl
2563. ~/.vim/spell/pl.iso-8859-2.add.spl
2574. /usr/share/vim70/spell/pl.iso-8859-2.add.spl
2585. ~/.vim/after/spell/pl.iso-8859-2.add.spl
259
260This assumes 1. is not found and 2. is found.
261
262If 'encoding' is "latin1" Vim will look for:
2631. ~/.vim/spell/pl.latin1.spl
2642. /usr/share/vim70/spell/pl.latin1.spl
2653. ~/.vim/after/spell/pl.latin1.spl
2664. ~/.vim/spell/pl.ascii.spl
2675. /usr/share/vim70/spell/pl.ascii.spl
2686. ~/.vim/after/spell/pl.ascii.spl
269
270This assumes none of them are found (Polish doesn't make sense when leaving
271out the non-ASCII characters).
Bram Moolenaar217ad922005-03-20 22:37:15 +0000272
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000273Spelling for EBCDIC is currently not supported.
274
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000275A spell file might not be available in the current 'encoding'. See
276|spell-mkspell| about how to create a spell file. Converting a spell file
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000277with "iconv" will NOT work!
Bram Moolenaar217ad922005-03-20 22:37:15 +0000278
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000279 *E758* *E759*
280When loading a spell file Vim checks that it is properly formatted. If you
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000281get an error the file may be truncated, modified or intended for another Vim
282version.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000283
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000284
285WORDS
286
287Vim uses a fixed method to recognize a word. This is independent of
288'iskeyword', so that it also works in help files and for languages that
289include characters like '-' in 'iskeyword'. The word characters do depend on
290'encoding'.
291
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000292The table with word characters is stored in the main .spl file. Therefore it
293matters what the current locale is when generating it! A .add.spl file does
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000294not contain a word table though.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000295
Bram Moolenaar3638c682005-06-08 22:05:14 +0000296A word that starts with a digit is always ignored. That includes hex numbers
297in the form 0xff and 0XFF.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000298
299
Bram Moolenaar30abd282005-06-22 22:35:10 +0000300WORD COMBINATIONS
301
302It is possible to spell-check words that include a space. This is used to
303recognize words that are invalid when used by themselves, e.g. for "et al.".
304It can also be used to recognize "the the" and highlight it.
305
306The number of spaces is irrelevant. In most cases a line break may also
307appear. However, this makes it difficult to find out where to start checking
308for spelling mistakes. When you make a change to one line and only that line
309is redrawn Vim won't look in the previous line, thus when "et" is at the end
310of the previous line "al." will be flagged as an error. And when you type
311"the<CR>the" the highlighting doesn't appear until the first line is redrawn.
312Use |CTRL-L| to redraw right away. "[s" will also stop at a word combination
313with a line break.
314
315When encountering a line break Vim skips characters such as '*', '>' and '"',
316so that comments in C, shell and Vim code can be spell checked.
317
318
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +0000319SYNTAX HIGHLIGHTING *spell-syntax*
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000320
321Files that use syntax highlighting can specify where spell checking should be
322done:
323
Bram Moolenaar3638c682005-06-08 22:05:14 +00003241. everywhere default
3252. in specific items use "contains=@Spell"
3263. everywhere but specific items use "contains=@NoSpell"
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000327
Bram Moolenaar3638c682005-06-08 22:05:14 +0000328For the second method adding the @NoSpell cluster will disable spell checking
329again. This can be used, for example, to add @Spell to the comments of a
330program, and add @NoSpell for items that shouldn't be checked.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000331
Bram Moolenaar30abd282005-06-22 22:35:10 +0000332
333VIM SCRIPTS
334
335If you want to write a Vim script that does something with spelling, you may
336find these functions useful:
337
338 spellbadword() find badly spelled word at the cursor
339 spellsuggest() get list of spelling suggestions
Bram Moolenaard042c562005-06-30 22:04:15 +0000340 soundfold() get the sound-a-like version of a word
Bram Moolenaar30abd282005-06-22 22:35:10 +0000341
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000342
343SETTING 'spellcapcheck' AUTOMATICALLY *set-spc-auto*
344
345After the 'spelllang' option has been set successfully, Vim will source the
346files "spell/LANG.vim" in 'runtimepath'. "LANG" is the value of 'spelllang'
347up to the first comma, dot or underscore. This can be used to set options
348specifically for the language, especially 'spellcapcheck'.
349
350The distribution includes a few of these files. Use this command to see what
351they do: >
352 :next $VIMRUNTIME/spell/*.vim
353
354Note that the default scripts don't set 'spellcapcheck' if it was changed from
355the default value. This assumes the user prefers another value then.
356
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000357
358DOUBLE SCORING *spell-double-scoring*
359
360The 'spellsuggest' option can be used to select "double" scoring. This
361mechanism is based on the principle that there are two kinds of spelling
362mistakes:
363
3641. You know how to spell the word, but mistype something. This results in a
365 small editing distance (character swapped/omitted/inserted) and possibly a
366 word that sounds completely different.
367
3682. You don't know how to spell the word and type something that sounds right.
369 The edit distance can be big but the word is similar after sound-folding.
370
371Since scores for these two mistakes will be very different we use a list
372for each and mix them.
373
374The sound-folding is slow and people that know the language won't make the
375second kind of mistakes. Therefore 'spellsuggest' can be set to select the
376preferred method for scoring the suggestions.
377
Bram Moolenaar217ad922005-03-20 22:37:15 +0000378==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00003793. Generating a spell file *spell-mkspell*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000380
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000381Vim uses a binary file format for spelling. This greatly speeds up loading
382the word list and keeps it small.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000383 *.aff* *.dic* *Myspell*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000384You can create a Vim spell file from the .aff and .dic files that Myspell
385uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to
386find them here:
387 http://lingucomponent.openoffice.org/spell_dic.html
Bram Moolenaar30abd282005-06-22 22:35:10 +0000388You can also use a plain word list. The results are the same, the choice
Bram Moolenaard042c562005-06-30 22:04:15 +0000389depends on what word lists you can find.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000390
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000391If you install Aap (from www.a-a-p.org) you can use the recipes in the
392runtime/spell/??/ directories. Aap will take care of downloading the files,
393apply patches needed for Vim and build the .spl file.
394
Bram Moolenaare13305e2005-06-19 22:54:15 +0000395Make sure your current locale is set properly, otherwise Vim doesn't know what
396characters are upper/lower case letters. If the locale isn't available (e.g.,
397when using an MS-Windows codepage on Unix) add tables to the .aff file
Bram Moolenaar3b506942005-06-23 22:36:45 +0000398|spell-affix-chars|. If the .aff file doesn't define a table then the word
399table of the currently active spelling is used. If spelling is not active
400then Vim will try to guess.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000401
Bram Moolenaar3b506942005-06-23 22:36:45 +0000402 *:mksp* *:mkspell*
403:mksp[ell][!] [-ascii] {outname} {inname} ...
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000404 Generate a Vim spell file word lists. Example: >
Bram Moolenaard042c562005-06-30 22:04:15 +0000405 :mkspell /tmp/nl nl_NL.words
Bram Moolenaar3b506942005-06-23 22:36:45 +0000406< *E751*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000407 When {outname} ends in ".spl" it is used as the output
408 file name. Otherwise it should be a language name,
Bram Moolenaar3b506942005-06-23 22:36:45 +0000409 such as "en", without the region name. The file
410 written will be "{outname}.{encoding}.spl", where
411 {encoding} is the value of the 'encoding' option.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000412
Bram Moolenaard042c562005-06-30 22:04:15 +0000413 When the output file already exists [!] must be used
Bram Moolenaar520470a2005-06-16 21:59:56 +0000414 to overwrite it.
415
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000416 When the [-ascii] argument is present, words with
417 non-ascii characters are skipped. The resulting file
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000418 ends in "ascii.spl".
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000419
420 The input can be the Myspell format files {inname}.aff
421 and {inname}.dic. If {inname}.aff does not exist then
422 {inname} is used as the file name of a plain word
423 list.
424
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000425 Multiple {inname} arguments can be given to combine
426 regions into one Vim spell file. Example: >
427 :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
428< This combines the English word lists for US, CA and AU
429 into one en.spl file.
430 Up to eight regions can be combined. *E754* *755*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000431 The REP and SAL items of the first .aff file where
432 they appear are used. |spell-affix-REP|
433 |spell-affix-SAL|
Bram Moolenaar217ad922005-03-20 22:37:15 +0000434
Bram Moolenaar30abd282005-06-22 22:35:10 +0000435 This command uses a lot of memory, required to find
436 the optimal word tree (Polish requires a few hundred
437 Mbyte). The final result will be much smaller.
438
Bram Moolenaard042c562005-06-30 22:04:15 +0000439 After the spell file was written and it was being used
440 in a buffer it will be reloaded automatically.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000441
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000442:mksp[ell] [-ascii] {name}.{enc}.add
443 Like ":mkspell" above, using {name}.{enc}.add as the
Bram Moolenaard042c562005-06-30 22:04:15 +0000444 input file and producing an output file in the same
445 directory that has ".spl" appended.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000446
447:mksp[ell] [-ascii] {name}
448 Like ":mkspell" above, using {name} as the input file
Bram Moolenaard042c562005-06-30 22:04:15 +0000449 and producing an output file in the same directory
450 that has ".{enc}.spl" appended.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000451
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000452Vim will report the number of duplicate words. This might be a mistake in the
453list of words. But sometimes it is used to have different prefixes and
454suffixes for the same basic word to avoid them combining (e.g. Czech uses
455this).
456
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000457Since you might want to change a Myspell word list for use with Vim the
458following procedure is recommended:
Bram Moolenaar217ad922005-03-20 22:37:15 +0000459
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00004601. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
4612. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
4623. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000463 words, define word characters with FOL/LOW/UPP, etc. The distributed
464 "src/spell/*.diff" files can be used.
Bram Moolenaard042c562005-06-30 22:04:15 +00004654. Start Vim with the right locale and use |:mkspell| to generate the Vim
466 spell file.
4675. Try out the spell file with ":set spell spelllang=xx" if you wrote it in
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000468 a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you
Bram Moolenaard042c562005-06-30 22:04:15 +0000469 wrote it somewhere else.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000470
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000471When the Myspell files are updated you can merge the differences:
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004721. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
4732. Use Vimdiff to see what changed: >
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000474 vimdiff xx_YY.orig.dic xx_YY.new.dic
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004753. Take over the changes you like in xx_YY.dic.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000476 You may also need to change xx_YY.aff.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004774. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000478
Bram Moolenaar3b506942005-06-23 22:36:45 +0000479
480SPELL FILE DUMP
481
482If for some reason you want to check what words are supported by the currently
483used spelling files, use this command:
484
485 *:spelldump* *:spelld*
486:spelld[ump] Open a new window and fill it with all currently valid
487 words.
Bram Moolenaard042c562005-06-30 22:04:15 +0000488 Note: For some languages the result may be enormous,
489 causing Vim to run out of memory.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000490
491The format of the word list is used |spell-wordlist-format|. You should be
492able to read it with ":mkspell" to generate one .spl file that includes all
493the words.
494
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000495When all entries to 'spelllang' use the same regions or no regions at all then
496the region information is included in the dumped words. Otherwise only words
497for the current region are included and no "/regions" line is generated.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000498
Bram Moolenaard042c562005-06-30 22:04:15 +0000499Comment lines with the name of the .spl file are used as a header above the
500words that were generated from that .spl file.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000501
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000502==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00005034. Spell file format *spell-file-format*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000504
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000505This is the format of the files that are used by the person who creates and
506maintains a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000507
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000508Note that we avoid the word "dictionary" here. That is because the goal of
509spell checking differs from writing a dictionary (as in the book). For
Bram Moolenaard042c562005-06-30 22:04:15 +0000510spelling we need a list of words that are OK, thus should not to be
511highlighted. Person and company names will not appear in a dictionary, but do
512appear in a word list. And some old words are rarely used while they are
513common misspellings. These do appear in a dictionary but not in a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000514
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000515There are two formats: A straight list of words and a list using affix
Bram Moolenaard042c562005-06-30 22:04:15 +0000516compression. The files with affix compression are used by Myspell (Mozilla
517and OpenOffice.org). This requires two files, one with .aff and one with .dic
518extension.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000519
520
Bram Moolenaard042c562005-06-30 22:04:15 +0000521FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format*
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000522
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000523The words must appear one per line. That is all that is required.
Bram Moolenaard042c562005-06-30 22:04:15 +0000524
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000525Additionally the following items are recognized:
Bram Moolenaard042c562005-06-30 22:04:15 +0000526
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000527- Empty and blank lines are ignored.
Bram Moolenaard042c562005-06-30 22:04:15 +0000528
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000529- Lines starting with a # are ignored (comment lines).
Bram Moolenaard042c562005-06-30 22:04:15 +0000530
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000531- A line starting with "/encoding=", before any word, specifies the encoding
532 of the file. After the second '=' comes an encoding name. This tells Vim
Bram Moolenaard042c562005-06-30 22:04:15 +0000533 to setup conversion from the specified encoding to 'encoding'. Thus you can
534 use one word list for several target encodings.
535
Bram Moolenaar3638c682005-06-08 22:05:14 +0000536- A line starting with "/regions=" specifies the region names that are
537 supported. Each region name must be two ASCII letters. The first one is
538 region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca".
Bram Moolenaard042c562005-06-30 22:04:15 +0000539 In an addition word list the region names should be equal to the main word
540 list!
541
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000542- Other lines starting with '/' are reserved for future use. The ones that
543 are not recognized are ignored (but you do get a warning message).
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000544
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000545- A "/" may follow the word with the following items:
546 = Case must match exactly.
547 ? Rare word.
548 ! Bad (wrong) word.
549 digit A region in which the word is valid. If no regions are
550 specified the word is valid in all regions.
551
Bram Moolenaar3638c682005-06-08 22:05:14 +0000552Example:
553
554 # This is an example word list comment
555 /encoding=latin1 encoding of the file
556 /regions=uscagb regions "us", "ca" and "gb"
557 example word for all regions
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000558 blah/12 word for regions "us" and "ca"
559 vim/! bad word
560 Campbell/?3 rare word in region 3 "gb"
561 's mornings/= keep-case word
Bram Moolenaar3638c682005-06-08 22:05:14 +0000562
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000563Note that when "/=" is used the same word with all upper-case letters is not
564accepted. This is different from a word with mixed case that is automatically
565marked as keep-case, those words may appear in all upper-case letters.
566
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000567
568FORMAT WITH AFFIX COMPRESSION
569
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000570There are two files: the basic word list and an affix file. The affixes are
571used to modify the basic words to get the full word list. This significantly
572reduces the number of words, especially for a language like Polish. This is
573called affix compression.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000574
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000575The format for the affix and word list files is mostly identical to what
576Myspell uses (the spell checker of Mozilla and OpenOffice.org). A description
577can be found here:
578 http://lingucomponent.openoffice.org/affix.readme ~
579Note that affixes are case sensitive, this isn't obvious from the description.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000580
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000581Vim supports a few extras. Hopefully Myspell will support these too some day.
582See |spell-affix-vim|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000583
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000584The basic word list and the affix file are combined and turned into a binary
585spell file. All the preprocessing has been done, thus this file loads fast.
586The binary spell file format is described in the source code (src/spell.c).
587But only developers need to know about it.
588
589The preprocessing also allows us to take the Myspell language files and modify
590them before the Vim word list is made. The tools for this can be found in the
591"src/spell" directory.
592
593
Bram Moolenaar3638c682005-06-08 22:05:14 +0000594WORD LIST FORMAT *spell-dic-format*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000595
596A very short example, with line numbers:
597
598 1 1234
599 2 aan
600 3 Als
601 4 Etten-Leur
602 5 et al.
603 6 's-Gravenhage
604 7 's-Gravenhaags
605 8 bedel/P
606 9 kado/1
607 10 cadeau/2
608
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000609The first line contains the number of words. Vim ignores it, but you do get
610an error message if it's not there. *E760*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000611
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000612What follows is one word per line. There should be no white space before or
613after the word.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000614
615When the word only has lower-case letters it will also match with the word
616starting with an upper-case letter.
617
618When the word includes an upper-case letter, this means the upper-case letter
619is required at this position. The same word with a lower-case letter at this
620position will not match. When some of the other letters are upper-case it will
621not match either.
622
Bram Moolenaard042c562005-06-30 22:04:15 +0000623The word with all upper-case characters will always be OK.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000624
625 word list matches does not match ~
626 als als Als ALS ALs AlS aLs aLS
627 Als Als ALS als ALs AlS aLs aLS
628 ALS ALS als Als ALs AlS aLs aLS
629 AlS AlS ALS als Als ALs aLs aLS
630
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000631The KEP affix ID can be used to specifically match a word with identical case
Bram Moolenaare7566042005-06-17 22:00:15 +0000632only, see below |spell-affix-KEP|.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000633
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000634Note in line 5 to 7 that non-word characters are used. You can include
635any character in a word. When checking the text a word still only matches
636when it appears with a non-word character before and after it. For Myspell a
637word starting with a non-word character probably won't work.
638
639After the word there is an optional slash and flags. Most of these flags are
Bram Moolenaard042c562005-06-30 22:04:15 +0000640letters that indicate the affixes that can be used with this word. These are
641specified with SFX and PFX lines in the .aff file. See the Myspell
642documentation.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000643
644 *spell-affix-vim*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000645A flag that Vim adds and is not in Myspell is the flag defined with KEP in the
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000646affix file. This has the meaning that case matters. This can be used if the
647word does not have the first letter in upper case at the start of a sentence.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000648Example (assuming that = was used for KEP):
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000649
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000650 word list matches does not match ~
651 's morgens/= 's morgens 'S morgens 's Morgens 'S MORGENS
652 's Morgens 's Morgens 'S MORGENS 'S morgens 's morgens
653
654The flag can also be used to avoid that the word matches when it is in all
655upper-case letters.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000656
657 *spell-affix-mbyte*
658The basic word list is normally in an 8-bit encoding, which is mentioned in
659the affix file. The affix file must always be in the same encoding as the
660word list. This is compatible with Myspell. For Vim the encoding may also be
661something else, any encoding that "iconv" supports. The "SET" line must
662specify the name of the encoding. When using a multi-byte encoding it's
Bram Moolenaard042c562005-06-30 22:04:15 +0000663possible to use more different affixes (but Myspell doesn't support that, thus
664you may not want to use it anyway).
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000665
Bram Moolenaare13305e2005-06-19 22:54:15 +0000666
667CHARACTER TABLES
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000668 *spell-affix-chars*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000669When using an 8-bit encoding the affix file should define what characters are
670word characters (as specified with ENC). This is because the system where
671":mkspell" is used may not support a locale with this encoding and isalpha()
672won't work. For example when using "cp1250" on Unix.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000673
Bram Moolenaare7566042005-06-17 22:00:15 +0000674 *E761* *E762* *spell-affix-FOL*
675 *spell-affix-LOW* *spell-affix-UPP*
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000676Three lines in the affix file are needed. Simplistic example:
677
Bram Moolenaare13305e2005-06-19 22:54:15 +0000678 FOL áëñ ~
679 LOW áëñ ~
680 UPP ÁËÑ ~
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000681
682All three lines must have exactly the same number of characters.
683
684The "FOL" line specifies the case-folded characters. These are used to
685compare words while ignoring case. For most encodings this is identical to
686the lower case line.
687
688The "LOW" line specifies the characters in lower-case. Mostly it's equal to
689the "FOL" line.
690
691The "UPP" line specifies the characters with upper-case. That is, a character
692is upper-case where it's different from the character at the same position in
693"FOL".
694
695ASCII characters should be omitted, Vim always handles these in the same way.
696When the encoding is UTF-8 no word characters need to be specified.
697
698 *E763*
Bram Moolenaar3b506942005-06-23 22:36:45 +0000699Vim allows you to use spell checking for several languages in the same file.
700You can list them in the 'spelllang' option. As a consequence all spell files
701for the same encoding must use the same word characters, otherwise they can't
702be combined without errors. If you get a warning that the word tables differ
703you may need to generate the .spl file again with |:mkspell|. Check the FOL,
704LOW and UPP lines in the used .aff file.
705
706The XX.ascii.spl spell file generated with the "-ascii" argument will not
707contain the table with characters, so that it can be combine with spell files
708for any encoding. The .add.spl files also do not contain the table.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000709
Bram Moolenaare7566042005-06-17 22:00:15 +0000710
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000711MID-WORD CHARACTERS
712 *spell-midword*
713Some characters are only to be considered word characters if they are used in
714between two ordinary word characters. An example is the single quote: It is
715often used to put text in quotes, thus it can't be recognized as a word
716character, but when it appears in between word characters it must be part of
717the word. This is needed to detect a spelling error such as they'are. That
718should be they're, but since "they" and "are" are words themselves that would
719go unnoticed.
720
721These characters are defined with MIDWORD in the .aff file:
722
723 MIDWORD '- ~
724
725
Bram Moolenaare13305e2005-06-19 22:54:15 +0000726AFFIXES
727 *spell-affix-PFX* *spell-affix-SFX*
728The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000729documentation or the Aspell manual:
730http://aspell.net/man-html/Affix-Compression.html).
Bram Moolenaare13305e2005-06-19 22:54:15 +0000731
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000732Note that Myspell ignores any extra text after the relevant info. Vim
733requires this text to start with a "#" so that mistakes don't go unnoticed.
734Example:
735
736 SFX F 0 in [^i]n # Spion > Spionin ~
737 SFX F 0 nen in # Bauerin > Bauerinnen ~
738
739An extra item for Vim is the "rare" flag. It must come after the other
740fields, before a comment. When used then all words that use the affix will be
741marked as rare words. Example:
742
743 PFX F 0 nene . rare ~
744 SFX F 0 oin n rare # hardly ever used ~
745
746However, if the word also appears as a good word in another way it won't be
747marked as rare.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000748
749 *spell-affix-PFXPOSTPONE*
750When an affix file has very many prefixes that apply to many words it's not
751possible to build the whole word list in memory. This applies to Hebrew (a
752list with all words is over a Gbyte). In that case applying prefixes must be
753postponed. This makes spell checking slower. It is indicated by this keyword
754in the .aff file:
755
756 PFXPOSTPONE ~
757
758Only prefixes without a chop string can be postponed, prefixes with a chop
Bram Moolenaar78984f52005-08-01 07:19:10 +0000759string will still be included in the word list. An exception if the chop
760string is one character and equal to the last character of the added string,
761but in lower case. Thus when the chop string is used to allow the following
762word to start with an upper case letter.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000763
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000764It is not possible to use PFXPOSTPONE together with COMPOUNDFLAG or
765COMPOUNDFLAGS.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000766
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000767
768WORDS WITH A SLASH *spell-affix-SLASH*
769
770The slash is used in the .dic file to separate the basic word from the affix
771letters that can be used. Unfortunately, this means you cannot use a slash in
772a word. Thus "TCP/IP" cannot be a word. To work around that you can define a
773replacement character for the slash. Example:
774
775 SLASH , ~
776
777Now you can use "TCP,IP" to add the word "TCP/IP".
778
779Of course, the letter used should itself not appear in any word! The letter
780must be ASCII, thus a single byte.
781
782
783KEEP-CASE WORDS *spell-affix-KEP*
784
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000785In the affix file a KEP line can be used to define the affix name used for
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000786keep-case words. Example:
787
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000788 KEP = ~
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000789
790See above for an example |spell-affix-vim|.
791
Bram Moolenaare13305e2005-06-19 22:54:15 +0000792
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000793RARE WORDS *spell-affix-RAR*
794
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000795In the affix file a RAR line can be used to define the affix name used for
796rare words. Example:
797
798 RAR ? ~
799
800Rare words are highlighted differently from bad words. This is to be used for
801words that are correct for the language, but are hardly ever used and could be
Bram Moolenaar30abd282005-06-22 22:35:10 +0000802a typing mistake anyway. When the same word is found as good it won't be
803highlighted as rare.
804
805
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000806BAD WORDS *spell-affix-BAD*
807
Bram Moolenaar30abd282005-06-22 22:35:10 +0000808In the affix file a BAD line can be used to define the affix name used for
809bad words. Example:
810
811 BAD ! ~
812
813This can be used to exclude words that would otherwise be good. For example
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000814"the the" in the .dic file:
815
816 the the/! ~
817
818Once a word has been marked as bad it won't be undone by encountering the same
819word as good.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000820
821
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000822COMPOUND WORDS *spell-affix-compound*
823
824A compound word is a longer word made by concatenating words. To specify
825which words may be concatenated a character is used. This character is put in
826the list of affixes after the word. We will call this character a flag here.
827Obviously these flags must be different from any affix IDs used.
828
829 *spell-COMPOUNDFLAG*
830The Myspell compatible method uses one flag, specified with COMPOUNDFLAG.
831All words with this flag combine in any order and without limit in length.
832This means there is no control over which word comes first. Example:
833 COMPOUNDFLAG c ~
834
835 *spell-COMPOUNDFLAGS*
836The method added by Vim allows specifying which words can be prepended to
837other words, and which words can be appended to other words. This is a list
838of comma separated items. Each item may contain zero or more dashes and plus
839signs.
840
841NOTE: At this moment COMPOUNDFLAGS has not been implemented yet!
842
843An item without dashes specifies words that combine in any order and as often
844as possible. Example:
845 COMPOUNDFLAGS c,m ~
846
847This allows all words with the "c" flag to be combined and all words with the
848"m" flag to be combined, but a word with the "c" flag doesn't combine with a
849word with the "m" flag.
850
851Flags that are put together, without a separating comma, are considered
852interchangable. Example:
853 COMPOUNDFLAGS cm ~
854
855This allows all words with the "c" and/or "m" flag to be combined.
856
857An item with one dash specifies flags for a leading word and flags for a
858trailing word. Thus only two-word combinations are made. Example:
859 COMPOUNDFLAGS f-d ~
860
861Here the 'f' flag can be used for food and 'd' for dishes, such that you can
862use these words in the dictionary:
863 tomato/f ~
864 onion/f~
865 soup/d~
866 salat/d~
867
868Which makes the words:
869 tomato
870 onion
871 soup
872 salat
873 tomatosoup
874 tomatosalat
875 onionsoup
876 onionsalat
877
878Note that something like "souptomato" is not possible. And that it's actually
879easier to list all the words if you have only this few.
880
881More dashes can be used to allow more words to combine. For example:
882 COMPOUNDFLAGS f-d,f-f-d ~
883
884Would allow "tomatoonionsoup" (OK, so this is a bad example, but you get the
885idea).
886
887When a word can be used an undetermined number of times use a plus instead of
888a dash. Example:
889 COMPOUNDFLAGS f+d ~
890
891Then you can make tasty "oniononiontomatotomatosoup".
892
893The "+" may also appear at the end, which means that the last flags can be
894repeated many times. Example:
895 COMPOUNDFLAGS f-d+ ~
896
897Which allows the use of "onionsoupsoupsoupsoupsoupsoup".
898
899 *spell-COMPOUNDMIN*
900The minimal length of a word used for concatenation is specified with
901COMPOUNDMIN. Example:
902 COMPOUNDMIN 5 ~
903
904When omitted a minimal length of 3 bytes is used. Obviously you could just
905leave out the compound flag from short words instead, this feature is present
906for compatibility with Myspell.
907
908 *spell-CMP*
909NOTE: At this moment CMP has not been implemented yet!
910
911Sometimes it is necessary to change a word when concatenating it to another,
912by removing a few letters, inserting something or both. It can also be useful
913to restrict concatenation to words that match a pattern. For this purpose CMP
914items can be used. They look like this:
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000915 CMP {flag} {flags} {strip} {add} {cond} {cond2}
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000916
917 {flag} the flag, as used in COMPOUNDFLAGS for the lead word
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000918 {flags} accepted flags for the following word ('.' to accept
919 all)
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000920 {strip} text to remove from the end of the lead word (zero
921 for no stripping)
922 {add} text to insert between the words (zero for no
923 addition)
924 {cond} condition to match at the end of the lead word
925 {cond2} condition to match at the start of the following word
926
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000927This is the same as what is used for SFX and PFX items, with the extra {flags}
928and {cond2} fields. Example:
929 CMP f mrt 0 - . . ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000930
931When used with the food and dish word list above, this means that a dash is
932inserted after each food item. Thus you get "onion-soup" and
933"onion-tomato-salat".
934
935When there are CMP items for a compound flag the concatenation is only done
936when a CMP item matches.
937
938When there are no CMP items for a compound flag, then all words will be
939concatenated, as if there was an item:
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000940 CMP {flag} . 0 0 . .
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000941
942
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000943REPLACEMENTS *spell-affix-REP*
944
945In the affix file REP items can be used to define common mistakes. This is
946used to make spelling suggestions. The items define the "from" text and the
947"to" replacement. Example:
948
949 REP 4 ~
950 REP f ph ~
951 REP ph f ~
952 REP k ch ~
953 REP ch k ~
954
955The first line specifies the number of REP lines following. Vim ignores it.
Bram Moolenaard042c562005-06-30 22:04:15 +0000956Don't include simple one-character replacements or swaps. Vim will try these
957anyway. You can include whole words if you want to, but you might want to use
958the "file:" item in 'spellsuggest' instead.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000959
960
961SIMILAR CHARACTERS *spell-affix-MAP*
962
Bram Moolenaard042c562005-06-30 22:04:15 +0000963In the affix file MAP items can be used to define letters that are very much
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000964alike. This is mostly used for a letter with different accents. This is used
965to prefer suggestions with these letters substituted. Example:
966
967 MAP 2 ~
968 MAP eéëêè ~
969 MAP uüùúû ~
970
971The first line specifies the number of MAP lines following. Vim ignores it.
972
Bram Moolenaard042c562005-06-30 22:04:15 +0000973Each letter must appear in only one of the MAP items. It's a bit more
974efficient if the first letter is ASCII or at least one without accents.
Bram Moolenaare7566042005-06-17 22:00:15 +0000975
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000976
Bram Moolenaard042c562005-06-30 22:04:15 +0000977SOUND-A-LIKE *spell-affix-SAL*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000978
979In the affix file SAL items can be used to define the sounds-a-like mechanism
980to be used. The main items define the "from" text and the "to" replacement.
Bram Moolenaard042c562005-06-30 22:04:15 +0000981Simplistic example:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000982
983 SAL CIA X ~
984 SAL CH X ~
985 SAL C K ~
986 SAL K K ~
987
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000988There are a few rules and this can become quite complicated. An explanation
Bram Moolenaard042c562005-06-30 22:04:15 +0000989how it works can be found in the Aspell manual:
Bram Moolenaar42eeac32005-06-29 22:40:58 +0000990http://aspell.net/man-html/Phonetic-Code.html.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000991
992There are a few special items:
993
994 SAL followup true ~
995 SAL collapse_result true ~
996 SAL remove_accents true ~
997
998"1" has the same meaning as "true". Any other value means "false".
999
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001000
1001SIMPLE SOUNDFOLDING *spell-affix-SOFOFROM* *spell-affix-SOFOTO*
1002
1003The SAL mechanism is complex and slow. A simpler mechanism is mapping all
1004characters to another character, mapping similar sounding characters to the
1005same character. At the same time this does case folding. You can not have
Bram Moolenaard042c562005-06-30 22:04:15 +00001006both SAL items and simple soundfolding.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001007
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001008There are two items required: one to specify the characters that are mapped
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001009and one that specifies the characters they are mapped to. They must have
1010exactly the same number of characters. Example:
1011
1012 SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~
1013 SOFOTO ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~
1014
1015In the example all vowels are mapped to the same character 'e'. Another
Bram Moolenaard042c562005-06-30 22:04:15 +00001016method would be to leave out all vowels. Some characters that sound nearly
1017the same and are often mixed up, such as 'm' and 'n', are mapped to the same
1018character. Don't do this too much, all words will start looking alike.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001019
1020Characters that do not appear in SOFOFROM will be left out, except that all
1021white space is replaced by one space. Sequences of the same character in
1022SOFOFROM are replaced by one.
1023
1024You can use the |soundfold()| function to try out the results. Or set the
1025'verbose' option to see the score in the output of the |z?| command.
1026
1027
Bram Moolenaar217ad922005-03-20 22:37:15 +00001028 vim:tw=78:sw=4:ts=8:ft=help:norl: