blob: 2ec2eba36f578ada6001c6a874b7e715f8925f30 [file] [log] [blame]
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001*spell.txt* For Vim version 7.0aa. Last change: 2005 Aug 19
Bram Moolenaar217ad922005-03-20 22:37:15 +00002
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7Spell checking *spell*
8
91. Quick start |spell-quickstart|
Bram Moolenaard042c562005-06-30 22:04:15 +0000102. Remarks on spell checking |spell-remarks|
113. Generating a spell file |spell-mkspell|
124. Spell file format |spell-file-format|
Bram Moolenaar217ad922005-03-20 22:37:15 +000013
14{Vi does not have any of these commands}
15
16Spell checking is not available when the |+syntax| feature has been disabled
17at compile time.
18
19==============================================================================
201. Quick start *spell-quickstart*
21
22This command switches on spell checking: >
23
24 :setlocal spell spelllang=en_us
25
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000026This switches on the 'spell' option and specifies to check for US English.
Bram Moolenaar217ad922005-03-20 22:37:15 +000027
28The words that are not recognized are highlighted with one of these:
Bram Moolenaar520470a2005-06-16 21:59:56 +000029 SpellBad word not recognized |hl-SpellBad|
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +000030 SpellCap word not capitalised |hl-SpellCap|
Bram Moolenaar520470a2005-06-16 21:59:56 +000031 SpellRare rare word |hl-SpellRare|
32 SpellLocal wrong spelling for selected region |hl-SpellLocal|
Bram Moolenaar217ad922005-03-20 22:37:15 +000033
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000034Vim only checks words for spelling, there is no grammar check.
35
Bram Moolenaar45360022005-07-21 21:08:21 +000036If the 'mousemodel' option is set to "popup" and the cursor is on a badly
37spelled word or it is "popup_setpos" and the mouse pointer is on a badly
38spelled word, then the popup menu will contain an submenu to replace the bad
39word. Note: this slows down the appearance of the popup menu.
40
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000041To search for the next misspelled word:
42
43 *]s* *E756*
44]s Move to next misspelled word after the cursor.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000045 A count before the command can be used to repeat.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000046
47 *[s*
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000048[s Like "]s" but search backwards, find the misspelled
Bram Moolenaar30abd282005-06-22 22:35:10 +000049 word before the cursor. Doesn't recognize words
50 split over two lines, thus may stop at words that are
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000051 not highlighted as bad. Does not stop at word with
52 missing capital at the start of a line.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000053
54 *]S*
55]S Like "]s" but only stop at bad words, not at rare
56 words or words for another region.
57
58 *[S*
59[S Like "]S" but search backwards.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000060
Bram Moolenaar217ad922005-03-20 22:37:15 +000061
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +000062To add words to your own word list: *E764*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000063
64 *zg*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000065zg Add word under the cursor as a good word to the first
66 name in 'spellfile'. In Visual mode the selected
67 characters are added as a word (including white
68 space!). If the word is explicitly marked as bad word
69 in another spell file the result is unpredictable.
70 A count may precede the command to indicate the entry
71 in 'spellfile' to be used. A count of two uses the
72 second entry.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000073
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000074 *zG*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000075zG Like "zg" but add the word to the internal word list
76 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000077
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000078 *zw*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000079zw Like "zg" but mark the word as a wrong (bad) word.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000080
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000081 *zW*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000082zW Like "zw" but add the word to the internal word list
83 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000084
Bram Moolenaar520470a2005-06-16 21:59:56 +000085 *:spe* *:spellgood*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000086:[count]spe[llgood] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000087 Add {word} as a good word to 'spellfile', like with
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000088 "zg". Without count the first name is used, with a
89 count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000090
Bram Moolenaar53180ce2005-07-05 21:48:14 +000091:spe[llgood]! {word} Add {word} as a good word to the internal word list,
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000092 like with "zG".
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000093
Bram Moolenaar520470a2005-06-16 21:59:56 +000094 *:spellw* *:spellwrong*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000095:[count]spellw[rong] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000096 Add {word} as a wrong (bad) word to 'spellfile', as
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000097 with "zw". Without count the first name is used, with
98 a count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000099
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000100:spellw[rong]! {word} Add {word} as a wrong (bad) word to the internal word
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000101 list.
102
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000103After adding a word to 'spellfile' with the above commands its associated
Bram Moolenaard042c562005-06-30 22:04:15 +0000104".spl" file will automatically be updated and reloaded. If you change
105'spellfile' manually you need to use the |:mkspell| command. This sequence of
106commands mostly works well: >
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000107 :edit <file in 'spellfile'>
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000108< (make changes to the spell file) >
109 :mkspell! %
110
111More details about the 'spellfile' format below |spell-wordlist-format|.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000112
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000113 *internal-wordlist*
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000114The internal word list is used for all buffers where 'spell' is set. It is
115not stored, it is lost when you exit Vim. It is also cleared when 'encoding'
116is set.
117
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000118
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000119Finding suggestions for bad words:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000120 *z?*
Bram Moolenaar30abd282005-06-22 22:35:10 +0000121z? For the word under/after the cursor suggest correctly
Bram Moolenaard042c562005-06-30 22:04:15 +0000122 spelled words. This also works to find alternatives
123 for a word that is not highlighted as a bad word,
124 e.g., when the word after it is bad.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000125 The results are sorted on similarity to the word
126 under/after the cursor.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000127 This may take a long time. Hit CTRL-C when you are
128 bored.
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000129 This does not work when there is a line break halfway
130 a bad word (e.g., "the the").
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000131 You can enter the number of your choice or press
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000132 <Enter> if you don't want to replace. You can also
133 use the mouse to click on your choice (only works if
134 the mouse can be used in Normal mode and when there
Bram Moolenaard042c562005-06-30 22:04:15 +0000135 are no line wraps). Click on the first (header) line
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000136 to cancel.
Bram Moolenaarf3bd51a2005-06-14 22:11:18 +0000137 If 'verbose' is non-zero a score will be displayed to
138 indicate the likeliness to the badly spelled word (the
139 higher the score the more different).
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000140 When a word was replaced the redo command "." will
141 repeat the word replacement. This works like "ciw",
142 the good word and <Esc>.
143
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000144 *:spellr* *:spellrepall* *E752* *E753*
145:spellr[epall] Repeat the replacement done by |z?| for all matches
146 with the replaced word in the current window.
147
Bram Moolenaar488c6512005-08-11 20:09:58 +0000148In Insert mode, when the cursor is after a badly spelled word, you can use
149CTRL-X s to find suggestions. This works like Insert mode completion. Use
150CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s|
151
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000152The 'spellsuggest' option influences how the list of suggestions is generated
153and sorted. See |'spellsuggest'|.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000154
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000155The 'spellcapcheck' option is used to check the first word of a sentence
156starts with a capital. This doesn't work for the first word in the file.
157When there is a line break right after a sentence the highlighting of the next
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000158line may be postponed. Use |CTRL-L| when needed. Also see |set-spc-auto| for
159how it can be set automatically when 'spelllang' is set.
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000160
Bram Moolenaard042c562005-06-30 22:04:15 +0000161==============================================================================
1622. Remarks on spell checking *spell-remarks*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000163
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000164PERFORMANCE
165
Bram Moolenaard042c562005-06-30 22:04:15 +0000166Vim does on-the-fly spell checking. To make this work fast the word list is
167loaded in memory. Thus this uses a lot of memory (1 Mbyte or more). There
168might also be a noticeable delay when the word list is loaded, which happens
169when 'spell' is set and when 'spelllang' is set while 'spell' was already set.
170To minimize the delay each word list is only loaded once, it is not deleted
171when 'spelllang' is made empty or 'spell' is reset. When 'encoding' is set
172all the word lists are reloaded, thus you may notice a delay then too.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000173
174
Bram Moolenaar217ad922005-03-20 22:37:15 +0000175REGIONS
176
177A word may be spelled differently in various regions. For example, English
178comes in (at least) these variants:
179
180 en all regions
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000181 en_au Australia
Bram Moolenaar217ad922005-03-20 22:37:15 +0000182 en_ca Canada
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000183 en_gb Great Britain
184 en_nz New Zealand
185 en_us USA
Bram Moolenaar217ad922005-03-20 22:37:15 +0000186
187Words that are not used in one region but are used in another region are
Bram Moolenaar520470a2005-06-16 21:59:56 +0000188highlighted with SpellLocal |hl-SpellLocal|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000189
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000190Always use lowercase letters for the language and region names.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000191
Bram Moolenaar3638c682005-06-08 22:05:14 +0000192When adding a word with |zg| or another command it's always added for all
193regions. You can change that by manually editing the 'spellfile'. See
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000194|spell-wordlist-format|. Note that the regions as specified in the files in
195'spellfile' are only used when all entries in "spelllang" specify the same
196region (not counting files specified by their .spl name).
Bram Moolenaar3638c682005-06-08 22:05:14 +0000197
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000198 *spell-german*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000199Specific exception: For German these special regions are used:
200 de all German words accepted
201 de_de old and new spelling
202 de_19 old spelling
203 de_20 new spelling
204 de_at Austria
205 de_ch Switzerland
206
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000207 *spell-yiddish*
208Yiddish requires using "utf-8" encoding, because of the special characters
209used. If you are using latin1 Vim will use transliterated (romanized) Yiddish
210instead. If you want to use transliterated Yiddish with utf-8 use "yi-tr".
211In a table:
212 'encoding' 'spelllang'
213 utf-8 yi Yiddish
214 latin1 yi transliterated Yiddish
215 utf-8 yi-tr transliterated Yiddish
216
Bram Moolenaar217ad922005-03-20 22:37:15 +0000217
Bram Moolenaar3b506942005-06-23 22:36:45 +0000218SPELL FILES *spell-load*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000219
220Vim searches for spell files in the "spell" subdirectory of the directories in
Bram Moolenaar3638c682005-06-08 22:05:14 +0000221'runtimepath'. The name is: LL.EEE.spl, where:
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000222 LL the language name
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000223 EEE the value of 'encoding'
Bram Moolenaar217ad922005-03-20 22:37:15 +0000224
Bram Moolenaar3b506942005-06-23 22:36:45 +0000225The value for "LL" comes from 'spelllang', but excludes the region name.
226Examples:
227 'spelllang' LL ~
228 en_us en
229 en-rare en-rare
230 medical_ca medical
231
Bram Moolenaar3638c682005-06-08 22:05:14 +0000232Only the first file is loaded, the one that is first in 'runtimepath'. If
233this succeeds then additionally files with the name LL.EEE.add.spl are loaded.
234All the ones that are found are used.
235
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000236Additionally, the files related to the names in 'spellfile' are loaded. These
237are the files that |zg| and |zw| add good and wrong words to.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000238
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000239Exceptions:
240- Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
241 matter for spelling.
242- When no spell file for 'encoding' is found "ascii" is tried. This only
243 works for languages where nearly all words are ASCII, such as English. It
244 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
Bram Moolenaar3638c682005-06-08 22:05:14 +0000245 is being edited. For the ".add" files the same name as the found main
246 spell file is used.
247
248For example, with these values:
249 'runtimepath' is "~/.vim,/usr/share/vim70,~/.vim/after"
250 'encoding' is "iso-8859-2"
251 'spelllang' is "pl"
252
253Vim will look for:
2541. ~/.vim/spell/pl.iso-8859-2.spl
2552. /usr/share/vim70/spell/pl.iso-8859-2.spl
2563. ~/.vim/spell/pl.iso-8859-2.add.spl
2574. /usr/share/vim70/spell/pl.iso-8859-2.add.spl
2585. ~/.vim/after/spell/pl.iso-8859-2.add.spl
259
260This assumes 1. is not found and 2. is found.
261
262If 'encoding' is "latin1" Vim will look for:
2631. ~/.vim/spell/pl.latin1.spl
2642. /usr/share/vim70/spell/pl.latin1.spl
2653. ~/.vim/after/spell/pl.latin1.spl
2664. ~/.vim/spell/pl.ascii.spl
2675. /usr/share/vim70/spell/pl.ascii.spl
2686. ~/.vim/after/spell/pl.ascii.spl
269
270This assumes none of them are found (Polish doesn't make sense when leaving
271out the non-ASCII characters).
Bram Moolenaar217ad922005-03-20 22:37:15 +0000272
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000273Spelling for EBCDIC is currently not supported.
274
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000275A spell file might not be available in the current 'encoding'. See
276|spell-mkspell| about how to create a spell file. Converting a spell file
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000277with "iconv" will NOT work!
Bram Moolenaar217ad922005-03-20 22:37:15 +0000278
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000279 *E758* *E759*
280When loading a spell file Vim checks that it is properly formatted. If you
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000281get an error the file may be truncated, modified or intended for another Vim
282version.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000283
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000284
285WORDS
286
287Vim uses a fixed method to recognize a word. This is independent of
288'iskeyword', so that it also works in help files and for languages that
289include characters like '-' in 'iskeyword'. The word characters do depend on
290'encoding'.
291
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000292The table with word characters is stored in the main .spl file. Therefore it
293matters what the current locale is when generating it! A .add.spl file does
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000294not contain a word table though.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000295
Bram Moolenaar3638c682005-06-08 22:05:14 +0000296A word that starts with a digit is always ignored. That includes hex numbers
297in the form 0xff and 0XFF.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000298
299
Bram Moolenaar30abd282005-06-22 22:35:10 +0000300WORD COMBINATIONS
301
302It is possible to spell-check words that include a space. This is used to
303recognize words that are invalid when used by themselves, e.g. for "et al.".
304It can also be used to recognize "the the" and highlight it.
305
306The number of spaces is irrelevant. In most cases a line break may also
307appear. However, this makes it difficult to find out where to start checking
308for spelling mistakes. When you make a change to one line and only that line
309is redrawn Vim won't look in the previous line, thus when "et" is at the end
310of the previous line "al." will be flagged as an error. And when you type
311"the<CR>the" the highlighting doesn't appear until the first line is redrawn.
312Use |CTRL-L| to redraw right away. "[s" will also stop at a word combination
313with a line break.
314
315When encountering a line break Vim skips characters such as '*', '>' and '"',
316so that comments in C, shell and Vim code can be spell checked.
317
318
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +0000319SYNTAX HIGHLIGHTING *spell-syntax*
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000320
321Files that use syntax highlighting can specify where spell checking should be
322done:
323
Bram Moolenaar3638c682005-06-08 22:05:14 +00003241. everywhere default
3252. in specific items use "contains=@Spell"
3263. everywhere but specific items use "contains=@NoSpell"
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000327
Bram Moolenaar3638c682005-06-08 22:05:14 +0000328For the second method adding the @NoSpell cluster will disable spell checking
329again. This can be used, for example, to add @Spell to the comments of a
330program, and add @NoSpell for items that shouldn't be checked.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000331
Bram Moolenaar30abd282005-06-22 22:35:10 +0000332
333VIM SCRIPTS
334
335If you want to write a Vim script that does something with spelling, you may
336find these functions useful:
337
338 spellbadword() find badly spelled word at the cursor
339 spellsuggest() get list of spelling suggestions
Bram Moolenaard042c562005-06-30 22:04:15 +0000340 soundfold() get the sound-a-like version of a word
Bram Moolenaar30abd282005-06-22 22:35:10 +0000341
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000342
343SETTING 'spellcapcheck' AUTOMATICALLY *set-spc-auto*
344
345After the 'spelllang' option has been set successfully, Vim will source the
346files "spell/LANG.vim" in 'runtimepath'. "LANG" is the value of 'spelllang'
347up to the first comma, dot or underscore. This can be used to set options
348specifically for the language, especially 'spellcapcheck'.
349
350The distribution includes a few of these files. Use this command to see what
351they do: >
352 :next $VIMRUNTIME/spell/*.vim
353
354Note that the default scripts don't set 'spellcapcheck' if it was changed from
355the default value. This assumes the user prefers another value then.
356
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000357
358DOUBLE SCORING *spell-double-scoring*
359
360The 'spellsuggest' option can be used to select "double" scoring. This
361mechanism is based on the principle that there are two kinds of spelling
362mistakes:
363
3641. You know how to spell the word, but mistype something. This results in a
365 small editing distance (character swapped/omitted/inserted) and possibly a
366 word that sounds completely different.
367
3682. You don't know how to spell the word and type something that sounds right.
369 The edit distance can be big but the word is similar after sound-folding.
370
371Since scores for these two mistakes will be very different we use a list
372for each and mix them.
373
374The sound-folding is slow and people that know the language won't make the
375second kind of mistakes. Therefore 'spellsuggest' can be set to select the
376preferred method for scoring the suggestions.
377
Bram Moolenaar217ad922005-03-20 22:37:15 +0000378==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00003793. Generating a spell file *spell-mkspell*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000380
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000381Vim uses a binary file format for spelling. This greatly speeds up loading
382the word list and keeps it small.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000383 *.aff* *.dic* *Myspell*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000384You can create a Vim spell file from the .aff and .dic files that Myspell
385uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to
386find them here:
387 http://lingucomponent.openoffice.org/spell_dic.html
Bram Moolenaar30abd282005-06-22 22:35:10 +0000388You can also use a plain word list. The results are the same, the choice
Bram Moolenaard042c562005-06-30 22:04:15 +0000389depends on what word lists you can find.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000390
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000391If you install Aap (from www.a-a-p.org) you can use the recipes in the
392runtime/spell/??/ directories. Aap will take care of downloading the files,
393apply patches needed for Vim and build the .spl file.
394
Bram Moolenaare13305e2005-06-19 22:54:15 +0000395Make sure your current locale is set properly, otherwise Vim doesn't know what
396characters are upper/lower case letters. If the locale isn't available (e.g.,
397when using an MS-Windows codepage on Unix) add tables to the .aff file
Bram Moolenaar3b506942005-06-23 22:36:45 +0000398|spell-affix-chars|. If the .aff file doesn't define a table then the word
399table of the currently active spelling is used. If spelling is not active
400then Vim will try to guess.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000401
Bram Moolenaar3b506942005-06-23 22:36:45 +0000402 *:mksp* *:mkspell*
403:mksp[ell][!] [-ascii] {outname} {inname} ...
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000404 Generate a Vim spell file word lists. Example: >
Bram Moolenaard042c562005-06-30 22:04:15 +0000405 :mkspell /tmp/nl nl_NL.words
Bram Moolenaar3b506942005-06-23 22:36:45 +0000406< *E751*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000407 When {outname} ends in ".spl" it is used as the output
408 file name. Otherwise it should be a language name,
Bram Moolenaar3b506942005-06-23 22:36:45 +0000409 such as "en", without the region name. The file
410 written will be "{outname}.{encoding}.spl", where
411 {encoding} is the value of the 'encoding' option.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000412
Bram Moolenaard042c562005-06-30 22:04:15 +0000413 When the output file already exists [!] must be used
Bram Moolenaar520470a2005-06-16 21:59:56 +0000414 to overwrite it.
415
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000416 When the [-ascii] argument is present, words with
417 non-ascii characters are skipped. The resulting file
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000418 ends in "ascii.spl".
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000419
420 The input can be the Myspell format files {inname}.aff
421 and {inname}.dic. If {inname}.aff does not exist then
422 {inname} is used as the file name of a plain word
423 list.
424
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000425 Multiple {inname} arguments can be given to combine
426 regions into one Vim spell file. Example: >
427 :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
428< This combines the English word lists for US, CA and AU
429 into one en.spl file.
430 Up to eight regions can be combined. *E754* *755*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000431 The REP and SAL items of the first .aff file where
432 they appear are used. |spell-affix-REP|
433 |spell-affix-SAL|
Bram Moolenaar217ad922005-03-20 22:37:15 +0000434
Bram Moolenaar30abd282005-06-22 22:35:10 +0000435 This command uses a lot of memory, required to find
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000436 the optimal word tree (Polish, Italian and Hungarian
437 require several hundred Mbyte). The final result will
438 be much smaller, because compression is used. To
439 avoid running out of memory compression will be done
440 now and then. This can be tuned with the 'mkspellmem'
441 option.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000442
Bram Moolenaard042c562005-06-30 22:04:15 +0000443 After the spell file was written and it was being used
444 in a buffer it will be reloaded automatically.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000445
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000446:mksp[ell] [-ascii] {name}.{enc}.add
447 Like ":mkspell" above, using {name}.{enc}.add as the
Bram Moolenaard042c562005-06-30 22:04:15 +0000448 input file and producing an output file in the same
449 directory that has ".spl" appended.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000450
451:mksp[ell] [-ascii] {name}
452 Like ":mkspell" above, using {name} as the input file
Bram Moolenaard042c562005-06-30 22:04:15 +0000453 and producing an output file in the same directory
454 that has ".{enc}.spl" appended.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000455
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000456Vim will report the number of duplicate words. This might be a mistake in the
457list of words. But sometimes it is used to have different prefixes and
458suffixes for the same basic word to avoid them combining (e.g. Czech uses
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000459this). If you want Vim to report all duplicate words set the 'verbose'
460option.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000461
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000462Since you might want to change a Myspell word list for use with Vim the
463following procedure is recommended:
Bram Moolenaar217ad922005-03-20 22:37:15 +0000464
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00004651. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
4662. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
4673. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000468 words, define word characters with FOL/LOW/UPP, etc. The distributed
469 "src/spell/*.diff" files can be used.
Bram Moolenaard042c562005-06-30 22:04:15 +00004704. Start Vim with the right locale and use |:mkspell| to generate the Vim
471 spell file.
4725. Try out the spell file with ":set spell spelllang=xx" if you wrote it in
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000473 a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you
Bram Moolenaard042c562005-06-30 22:04:15 +0000474 wrote it somewhere else.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000475
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000476When the Myspell files are updated you can merge the differences:
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004771. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
4782. Use Vimdiff to see what changed: >
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000479 vimdiff xx_YY.orig.dic xx_YY.new.dic
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004803. Take over the changes you like in xx_YY.dic.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000481 You may also need to change xx_YY.aff.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004824. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000483
Bram Moolenaar3b506942005-06-23 22:36:45 +0000484
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000485SPELL FILE VERSIONS *E770* *E771* *E772*
486
487Spell checking is a relatively new feature in Vim, thus it's possible that the
488.spl file format will be changed to support more languages. Vim will check
489the validity of the spell file and report anything wrong.
490
491 E771: Old spell file, needs to be updated ~
492This spell file is older than your Vim. You need to update the .spl file.
493
494 E772: Spell file is for newer version of Vim ~
495This means the spell file was made for a later version of Vim. You need to
496update Vim.
497
498 E770: Unsupported section in spell file ~
499This means the spell file was made for a later version of Vim and contains a
500section that is required for the spell file to work. In this case it's
501probably a good idea to upgrade your Vim.
502
503
Bram Moolenaar3b506942005-06-23 22:36:45 +0000504SPELL FILE DUMP
505
506If for some reason you want to check what words are supported by the currently
507used spelling files, use this command:
508
509 *:spelldump* *:spelld*
510:spelld[ump] Open a new window and fill it with all currently valid
511 words.
Bram Moolenaard042c562005-06-30 22:04:15 +0000512 Note: For some languages the result may be enormous,
513 causing Vim to run out of memory.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000514
515The format of the word list is used |spell-wordlist-format|. You should be
516able to read it with ":mkspell" to generate one .spl file that includes all
517the words.
518
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000519When all entries to 'spelllang' use the same regions or no regions at all then
520the region information is included in the dumped words. Otherwise only words
521for the current region are included and no "/regions" line is generated.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000522
Bram Moolenaard042c562005-06-30 22:04:15 +0000523Comment lines with the name of the .spl file are used as a header above the
524words that were generated from that .spl file.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000525
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000526==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00005274. Spell file format *spell-file-format*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000528
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000529This is the format of the files that are used by the person who creates and
530maintains a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000531
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000532Note that we avoid the word "dictionary" here. That is because the goal of
533spell checking differs from writing a dictionary (as in the book). For
Bram Moolenaard042c562005-06-30 22:04:15 +0000534spelling we need a list of words that are OK, thus should not to be
535highlighted. Person and company names will not appear in a dictionary, but do
536appear in a word list. And some old words are rarely used while they are
537common misspellings. These do appear in a dictionary but not in a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000538
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000539There are two formats: A straight list of words and a list using affix
Bram Moolenaard042c562005-06-30 22:04:15 +0000540compression. The files with affix compression are used by Myspell (Mozilla
541and OpenOffice.org). This requires two files, one with .aff and one with .dic
542extension.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000543
544
Bram Moolenaard042c562005-06-30 22:04:15 +0000545FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format*
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000546
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000547The words must appear one per line. That is all that is required.
Bram Moolenaard042c562005-06-30 22:04:15 +0000548
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000549Additionally the following items are recognized:
Bram Moolenaard042c562005-06-30 22:04:15 +0000550
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000551- Empty and blank lines are ignored.
Bram Moolenaard042c562005-06-30 22:04:15 +0000552
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000553- Lines starting with a # are ignored (comment lines).
Bram Moolenaard042c562005-06-30 22:04:15 +0000554
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000555- A line starting with "/encoding=", before any word, specifies the encoding
556 of the file. After the second '=' comes an encoding name. This tells Vim
Bram Moolenaard042c562005-06-30 22:04:15 +0000557 to setup conversion from the specified encoding to 'encoding'. Thus you can
558 use one word list for several target encodings.
559
Bram Moolenaar3638c682005-06-08 22:05:14 +0000560- A line starting with "/regions=" specifies the region names that are
561 supported. Each region name must be two ASCII letters. The first one is
562 region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca".
Bram Moolenaard042c562005-06-30 22:04:15 +0000563 In an addition word list the region names should be equal to the main word
564 list!
565
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000566- Other lines starting with '/' are reserved for future use. The ones that
567 are not recognized are ignored (but you do get a warning message).
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000568
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000569- A "/" may follow the word with the following items:
570 = Case must match exactly.
571 ? Rare word.
572 ! Bad (wrong) word.
573 digit A region in which the word is valid. If no regions are
574 specified the word is valid in all regions.
575
Bram Moolenaar3638c682005-06-08 22:05:14 +0000576Example:
577
578 # This is an example word list comment
579 /encoding=latin1 encoding of the file
580 /regions=uscagb regions "us", "ca" and "gb"
581 example word for all regions
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000582 blah/12 word for regions "us" and "ca"
583 vim/! bad word
584 Campbell/?3 rare word in region 3 "gb"
585 's mornings/= keep-case word
Bram Moolenaar3638c682005-06-08 22:05:14 +0000586
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000587Note that when "/=" is used the same word with all upper-case letters is not
588accepted. This is different from a word with mixed case that is automatically
589marked as keep-case, those words may appear in all upper-case letters.
590
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000591
592FORMAT WITH AFFIX COMPRESSION
593
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000594There are two files: the basic word list and an affix file. The affixes are
595used to modify the basic words to get the full word list. This significantly
596reduces the number of words, especially for a language like Polish. This is
597called affix compression.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000598
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000599The format for the affix and word list files is mostly identical to what
600Myspell uses (the spell checker of Mozilla and OpenOffice.org). A description
601can be found here:
602 http://lingucomponent.openoffice.org/affix.readme ~
603Note that affixes are case sensitive, this isn't obvious from the description.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000604
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000605Vim supports a few extras. Hopefully Myspell will support these too some day.
606See |spell-affix-vim|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000607
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000608The basic word list and the affix file are combined and turned into a binary
609spell file. All the preprocessing has been done, thus this file loads fast.
610The binary spell file format is described in the source code (src/spell.c).
611But only developers need to know about it.
612
613The preprocessing also allows us to take the Myspell language files and modify
614them before the Vim word list is made. The tools for this can be found in the
615"src/spell" directory.
616
617
Bram Moolenaar3638c682005-06-08 22:05:14 +0000618WORD LIST FORMAT *spell-dic-format*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000619
620A very short example, with line numbers:
621
622 1 1234
623 2 aan
624 3 Als
625 4 Etten-Leur
626 5 et al.
627 6 's-Gravenhage
628 7 's-Gravenhaags
629 8 bedel/P
630 9 kado/1
631 10 cadeau/2
632
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000633The first line contains the number of words. Vim ignores it, but you do get
634an error message if it's not there. *E760*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000635
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000636What follows is one word per line. There should be no white space before or
637after the word.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000638
639When the word only has lower-case letters it will also match with the word
640starting with an upper-case letter.
641
642When the word includes an upper-case letter, this means the upper-case letter
643is required at this position. The same word with a lower-case letter at this
644position will not match. When some of the other letters are upper-case it will
645not match either.
646
Bram Moolenaard042c562005-06-30 22:04:15 +0000647The word with all upper-case characters will always be OK.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000648
649 word list matches does not match ~
650 als als Als ALS ALs AlS aLs aLS
651 Als Als ALS als ALs AlS aLs aLS
652 ALS ALS als Als ALs AlS aLs aLS
653 AlS AlS ALS als Als ALs aLs aLS
654
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000655The KEP affix ID can be used to specifically match a word with identical case
Bram Moolenaare7566042005-06-17 22:00:15 +0000656only, see below |spell-affix-KEP|.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000657
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000658Note in line 5 to 7 that non-word characters are used. You can include
659any character in a word. When checking the text a word still only matches
660when it appears with a non-word character before and after it. For Myspell a
661word starting with a non-word character probably won't work.
662
663After the word there is an optional slash and flags. Most of these flags are
Bram Moolenaard042c562005-06-30 22:04:15 +0000664letters that indicate the affixes that can be used with this word. These are
665specified with SFX and PFX lines in the .aff file. See the Myspell
666documentation.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000667
668 *spell-affix-vim*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000669A flag that Vim adds and is not in Myspell is the flag defined with KEP in the
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000670affix file. This has the meaning that case matters. This can be used if the
671word does not have the first letter in upper case at the start of a sentence.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000672Example (assuming that = was used for KEP):
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000673
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000674 word list matches does not match ~
675 's morgens/= 's morgens 'S morgens 's Morgens 'S MORGENS
676 's Morgens 's Morgens 'S MORGENS 'S morgens 's morgens
677
678The flag can also be used to avoid that the word matches when it is in all
679upper-case letters.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000680
681 *spell-affix-mbyte*
682The basic word list is normally in an 8-bit encoding, which is mentioned in
683the affix file. The affix file must always be in the same encoding as the
684word list. This is compatible with Myspell. For Vim the encoding may also be
685something else, any encoding that "iconv" supports. The "SET" line must
686specify the name of the encoding. When using a multi-byte encoding it's
Bram Moolenaard042c562005-06-30 22:04:15 +0000687possible to use more different affixes (but Myspell doesn't support that, thus
688you may not want to use it anyway).
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000689
Bram Moolenaare13305e2005-06-19 22:54:15 +0000690
691CHARACTER TABLES
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000692 *spell-affix-chars*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000693When using an 8-bit encoding the affix file should define what characters are
694word characters (as specified with ENC). This is because the system where
695":mkspell" is used may not support a locale with this encoding and isalpha()
696won't work. For example when using "cp1250" on Unix.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000697
Bram Moolenaare7566042005-06-17 22:00:15 +0000698 *E761* *E762* *spell-affix-FOL*
699 *spell-affix-LOW* *spell-affix-UPP*
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000700Three lines in the affix file are needed. Simplistic example:
701
Bram Moolenaare13305e2005-06-19 22:54:15 +0000702 FOL áëñ ~
703 LOW áëñ ~
704 UPP ÁËÑ ~
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000705
706All three lines must have exactly the same number of characters.
707
708The "FOL" line specifies the case-folded characters. These are used to
709compare words while ignoring case. For most encodings this is identical to
710the lower case line.
711
712The "LOW" line specifies the characters in lower-case. Mostly it's equal to
713the "FOL" line.
714
715The "UPP" line specifies the characters with upper-case. That is, a character
716is upper-case where it's different from the character at the same position in
717"FOL".
718
719ASCII characters should be omitted, Vim always handles these in the same way.
720When the encoding is UTF-8 no word characters need to be specified.
721
722 *E763*
Bram Moolenaar3b506942005-06-23 22:36:45 +0000723Vim allows you to use spell checking for several languages in the same file.
724You can list them in the 'spelllang' option. As a consequence all spell files
725for the same encoding must use the same word characters, otherwise they can't
726be combined without errors. If you get a warning that the word tables differ
727you may need to generate the .spl file again with |:mkspell|. Check the FOL,
728LOW and UPP lines in the used .aff file.
729
730The XX.ascii.spl spell file generated with the "-ascii" argument will not
731contain the table with characters, so that it can be combine with spell files
732for any encoding. The .add.spl files also do not contain the table.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000733
Bram Moolenaare7566042005-06-17 22:00:15 +0000734
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000735MID-WORD CHARACTERS
736 *spell-midword*
737Some characters are only to be considered word characters if they are used in
738between two ordinary word characters. An example is the single quote: It is
739often used to put text in quotes, thus it can't be recognized as a word
740character, but when it appears in between word characters it must be part of
741the word. This is needed to detect a spelling error such as they'are. That
742should be they're, but since "they" and "are" are words themselves that would
743go unnoticed.
744
745These characters are defined with MIDWORD in the .aff file:
746
747 MIDWORD '- ~
748
749
Bram Moolenaare13305e2005-06-19 22:54:15 +0000750AFFIXES
751 *spell-affix-PFX* *spell-affix-SFX*
752The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000753documentation or the Aspell manual:
754http://aspell.net/man-html/Affix-Compression.html).
Bram Moolenaare13305e2005-06-19 22:54:15 +0000755
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000756Note that Myspell ignores any extra text after the relevant info. Vim
757requires this text to start with a "#" so that mistakes don't go unnoticed.
758Example:
759
760 SFX F 0 in [^i]n # Spion > Spionin ~
761 SFX F 0 nen in # Bauerin > Bauerinnen ~
762
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000763 *spell-affix-rare*
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000764An extra item for Vim is the "rare" flag. It must come after the other
765fields, before a comment. When used then all words that use the affix will be
766marked as rare words. Example:
767
768 PFX F 0 nene . rare ~
769 SFX F 0 oin n rare # hardly ever used ~
770
771However, if the word also appears as a good word in another way it won't be
772marked as rare.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000773
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000774 *spell-affix-nocomp*
775Another extra item for Vim is the "nocomp" flag. It must come after the other
776fields, before a comment. It can be either before or after "rare". When used
777then all words that use the affix will not be used for compound words.
778Example:
779 affix file:
780 COMPOUNDFLAG c ~
781 SFX a Y 2 ~
782 SFX a 0 s . ~
783 SFX a 0 ize . nocomp ~
784 dictionary:
785 word/c ~
786 util/ac ~
787
788This allows for "wordutil" and "wordutils" but not "wordutilize".
789
Bram Moolenaare13305e2005-06-19 22:54:15 +0000790 *spell-affix-PFXPOSTPONE*
791When an affix file has very many prefixes that apply to many words it's not
792possible to build the whole word list in memory. This applies to Hebrew (a
793list with all words is over a Gbyte). In that case applying prefixes must be
794postponed. This makes spell checking slower. It is indicated by this keyword
795in the .aff file:
796
797 PFXPOSTPONE ~
798
799Only prefixes without a chop string can be postponed, prefixes with a chop
Bram Moolenaar78984f52005-08-01 07:19:10 +0000800string will still be included in the word list. An exception if the chop
801string is one character and equal to the last character of the added string,
802but in lower case. Thus when the chop string is used to allow the following
803word to start with an upper case letter.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000804
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000805
806WORDS WITH A SLASH *spell-affix-SLASH*
807
808The slash is used in the .dic file to separate the basic word from the affix
809letters that can be used. Unfortunately, this means you cannot use a slash in
810a word. Thus "TCP/IP" cannot be a word. To work around that you can define a
811replacement character for the slash. Example:
812
813 SLASH , ~
814
815Now you can use "TCP,IP" to add the word "TCP/IP".
816
817Of course, the letter used should itself not appear in any word! The letter
818must be ASCII, thus a single byte.
819
820
821KEEP-CASE WORDS *spell-affix-KEP*
822
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000823In the affix file a KEP line can be used to define the affix name used for
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000824keep-case words. Example:
825
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000826 KEP = ~
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000827
828See above for an example |spell-affix-vim|.
829
Bram Moolenaare13305e2005-06-19 22:54:15 +0000830
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000831RARE WORDS *spell-affix-RAR*
832
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000833In the affix file a RAR line can be used to define the affix name used for
834rare words. Example:
835
836 RAR ? ~
837
838Rare words are highlighted differently from bad words. This is to be used for
839words that are correct for the language, but are hardly ever used and could be
Bram Moolenaar30abd282005-06-22 22:35:10 +0000840a typing mistake anyway. When the same word is found as good it won't be
841highlighted as rare.
842
843
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000844BAD WORDS *spell-affix-BAD*
845
Bram Moolenaar30abd282005-06-22 22:35:10 +0000846In the affix file a BAD line can be used to define the affix name used for
847bad words. Example:
848
849 BAD ! ~
850
851This can be used to exclude words that would otherwise be good. For example
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000852"the the" in the .dic file:
853
854 the the/! ~
855
856Once a word has been marked as bad it won't be undone by encountering the same
857word as good.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000858
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000859 *spell-affix-NEEDAFFIX*
860The NEEDAFFIX flag is used to require that a word is used with an affix. The
861word itself is not a good word. Example:
862
863 NEEDAFFIX + ~
864
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000865
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000866COMPOUND WORDS *spell-affix-compound*
867
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000868A compound word is a longer word made by concatenating words that appear in
869the .dic file. To specify which words may be concatenated a character is
870used. This character is put in the list of affixes after the word. We will
871call this character a flag here. Obviously these flags must be different from
872any affix IDs used.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000873
874 *spell-COMPOUNDFLAG*
875The Myspell compatible method uses one flag, specified with COMPOUNDFLAG.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000876All words with this flag combine in any order. This means there is no control
877over which word comes first. Example:
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000878 COMPOUNDFLAG c ~
879
880 *spell-COMPOUNDFLAGS*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000881A more advanced method to specify how compound words can be formed uses
882multiple items with multiple flags. This is not compatible with Myspell 3.0.
883Let's start with an example:
884 COMPOUNDFLAGS c+ ~
885 COMPOUNDFLAGS se ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000886
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000887The first line defines that words with the "c" flag can be concatenated in any
888order. The second line defines compound words that are made of one word with
889the "s" flag and one word with the "e" flag. With this dictionary:
890 bork/c ~
891 onion/s ~
892 soup/e ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000893
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000894You can make these words:
895 bork
896 borkbork
897 borkborkbork
898 (etc.)
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000899 onion
900 soup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000901 onionsoup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000902
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000903The COMPOUNDFLAGS item may appear multiple times. The argument is made out of
904one or more groups, where each group can be:
905 one flag e.g., c
906 alternate flags inside [] e.g., [abc]
907Optionally this may be followed by:
908 * the group appears zero or more times, e.g., sm*e
909 + the group appears one or more times, e.g., c+
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000910
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000911This is similar to the regexp pattern syntax (but not the same!). A few
912examples with the sequence of word flags they require:
913 COMPOUNDFLAGS x+ x xx xxx etc.
914 COMPOUNDFLAGS yz yz
915 COMPOUNDFLAGS x+z xz xxz xxxz etc.
916 COMPOUNDFLAGS yx+ yx yxx yxxx etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000917
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000918 COMPOUNDFLAGS [abc]z az bz cz
919 COMPOUNDFLAGS [abc]+z az aaz abaz bz baz bcbz cz caz cbaz etc.
920 COMPOUNDFLAGS a[xyz]+ ax axx axyz ay ayx ayzz az azy azxy etc.
921 COMPOUNDFLAGS sm*e se sme smme smmme etc.
922 COMPOUNDFLAGS s[xyz]*e se sxe sxye sxyxe sye syze sze szye szyxe etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000923
924 *spell-COMPOUNDMIN*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000925The minimal byte length of a word used for concatenation is specified with
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000926COMPOUNDMIN. Example:
927 COMPOUNDMIN 5 ~
928
929When omitted a minimal length of 3 bytes is used. Obviously you could just
930leave out the compound flag from short words instead, this feature is present
931for compatibility with Myspell.
932
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000933 *spell-COMPOUNDMAX*
934The maximum number of words that can be concatenated into a compound word is
935specified with COMPOUNDMAX. Example:
936 COMPOUNDMAX 3 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000937
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000938When omitted there is no maximum. It applies to all compound words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000939
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000940To set a limit for words with specific flags make sure the items in
941COMPOUNDFLAGS where they appear don't allow too many words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000942
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000943 *spell-COMPOUNDSYLMAX*
944The maximum number of syllables that a compound word may contain is specified
945with COMPOUNDSYLMAX. Example:
946 COMPOUNDSYLMAX 6 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000947
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000948This has no effect if there is no SYLLABLE item. Without COMPOUNDSYLMAX there
949is no limit on the number of syllables.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000950
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000951 *spell-SYLLABLE*
952The SYLLABLE item defines characters or character sequences that are used to
953count the number of syllables in a word. Example:
954 SYLLABLE aáeéiíoóöõuúüûy/aa/au/ea/ee/ei/ie/oa/oe/oo/ou/uu/ui ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000955
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000956Before the first slash is the set of characters that are counted for one
957syllable, also when repeated and mixed, until the next character that is not
958in this set. After the slash come sequences of characters that are counted
959for one syllable. These are preferred over using characters from the set.
960With the example "ideeen" has three syllables, counted by "i", "ee" and "e".
961
962Only case-folded letters need to be included.
963
964Above another way to restrict compounding was mentioned above: adding "nocomp"
965after an affix causes all words that are made with that affix not be be used
966for compounding. |spell-affix-nocomp|
967
968>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
969NOTE: The following has not been implemented yet, because there are no word
970lists that support this.
971> *spell-CMP*
972> Sometimes it is necessary to change a word when concatenating it to another,
973> by removing a few letters, inserting something or both. It can also be useful
974> to restrict concatenation to words that match a pattern. For this purpose CMP
975> items can be used. They look like this:
976> CMP {flag} {flags} {strip} {strip2} {add} {cond} {cond2}
977>
978> {flag} the flag, as used in COMPOUNDFLAGS for the lead word
979> {flags} accepted flags for the following word ('.' to accept
980> all)
981> {strip} text to remove from the end of the lead word (zero
982> for no stripping)
983> {strip2} text to remove from the start of the following word
984> (zero for no stripping)
985> {add} text to insert between the words (zero for no
986> addition)
987> {cond} condition to match at the end of the lead word
988> {cond2} condition to match at the start of the following word
989>
990> This is the same as what is used for SFX and PFX items, with the extra {flags}
991> and {cond2} fields. Example:
992> CMP f mrt 0 - . . ~
993>
994> When used with the food and dish word list above, this means that a dash is
995> inserted after each food item. Thus you get "onion-soup" and
996> "onion-tomato-salat".
997>
998> When there are CMP items for a compound flag the concatenation is only done
999> when a CMP item matches.
1000>
1001> When there are no CMP items for a compound flag, then all words will be
1002> concatenated, as if there was an item:
1003> CMP {flag} . 0 0 . .
1004>
1005>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001006
1007
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001008REPLACEMENTS *spell-affix-REP*
1009
1010In the affix file REP items can be used to define common mistakes. This is
1011used to make spelling suggestions. The items define the "from" text and the
1012"to" replacement. Example:
1013
1014 REP 4 ~
1015 REP f ph ~
1016 REP ph f ~
1017 REP k ch ~
1018 REP ch k ~
1019
1020The first line specifies the number of REP lines following. Vim ignores it.
Bram Moolenaard042c562005-06-30 22:04:15 +00001021Don't include simple one-character replacements or swaps. Vim will try these
1022anyway. You can include whole words if you want to, but you might want to use
1023the "file:" item in 'spellsuggest' instead.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001024
1025
1026SIMILAR CHARACTERS *spell-affix-MAP*
1027
Bram Moolenaard042c562005-06-30 22:04:15 +00001028In the affix file MAP items can be used to define letters that are very much
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001029alike. This is mostly used for a letter with different accents. This is used
1030to prefer suggestions with these letters substituted. Example:
1031
1032 MAP 2 ~
1033 MAP eéëêè ~
1034 MAP uüùúû ~
1035
1036The first line specifies the number of MAP lines following. Vim ignores it.
1037
Bram Moolenaard042c562005-06-30 22:04:15 +00001038Each letter must appear in only one of the MAP items. It's a bit more
1039efficient if the first letter is ASCII or at least one without accents.
Bram Moolenaare7566042005-06-17 22:00:15 +00001040
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001041
Bram Moolenaard042c562005-06-30 22:04:15 +00001042SOUND-A-LIKE *spell-affix-SAL*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001043
1044In the affix file SAL items can be used to define the sounds-a-like mechanism
1045to be used. The main items define the "from" text and the "to" replacement.
Bram Moolenaard042c562005-06-30 22:04:15 +00001046Simplistic example:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001047
1048 SAL CIA X ~
1049 SAL CH X ~
1050 SAL C K ~
1051 SAL K K ~
1052
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001053There are a few rules and this can become quite complicated. An explanation
Bram Moolenaard042c562005-06-30 22:04:15 +00001054how it works can be found in the Aspell manual:
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001055http://aspell.net/man-html/Phonetic-Code.html.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001056
1057There are a few special items:
1058
1059 SAL followup true ~
1060 SAL collapse_result true ~
1061 SAL remove_accents true ~
1062
1063"1" has the same meaning as "true". Any other value means "false".
1064
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001065
1066SIMPLE SOUNDFOLDING *spell-affix-SOFOFROM* *spell-affix-SOFOTO*
1067
1068The SAL mechanism is complex and slow. A simpler mechanism is mapping all
1069characters to another character, mapping similar sounding characters to the
1070same character. At the same time this does case folding. You can not have
Bram Moolenaard042c562005-06-30 22:04:15 +00001071both SAL items and simple soundfolding.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001072
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001073There are two items required: one to specify the characters that are mapped
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001074and one that specifies the characters they are mapped to. They must have
1075exactly the same number of characters. Example:
1076
1077 SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~
1078 SOFOTO ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~
1079
1080In the example all vowels are mapped to the same character 'e'. Another
Bram Moolenaard042c562005-06-30 22:04:15 +00001081method would be to leave out all vowels. Some characters that sound nearly
1082the same and are often mixed up, such as 'm' and 'n', are mapped to the same
1083character. Don't do this too much, all words will start looking alike.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001084
1085Characters that do not appear in SOFOFROM will be left out, except that all
1086white space is replaced by one space. Sequences of the same character in
1087SOFOFROM are replaced by one.
1088
1089You can use the |soundfold()| function to try out the results. Or set the
1090'verbose' option to see the score in the output of the |z?| command.
1091
1092
Bram Moolenaar217ad922005-03-20 22:37:15 +00001093 vim:tw=78:sw=4:ts=8:ft=help:norl: