Ispell
When mnoGoSearch is used with ispell support all words are normalized by both indexer and search front-end. It allows to find same words with different endings. For example, if words "testing" or "tests" are found in a document, the word "test" will be stored by indexer instead. Search front-end will also try to find the word "test" if "testing" or "tests" is given in search query. Note that this scheme lacks exact search possibility, but usually reduces database size and makes search faster.
Note
If you add ispell support to already existing database, reindexing is required. In other case non-normalized words will not be found at all.
Two types of ispell files
MnoGoSearch understands two types of ispell files: affixes and dictionaries. Ispell affixes file contains rules for words and has approximately the following format:
Table 15.1. Flag V:
| E |
> -E, IVE |
As in create > creative |
| [ˆE] |
> IVE |
As in prevent > preventive |
Table 15.2. Flag *N:
| E |
> -E, ION |
As in create > creation |
| Y |
> -Y, ICATION |
As in multiply > multiplication |
| [ˆEY] |
> EN |
As in fall > fallen |
Ispell dictionary file contains words themselves and has the following format:
Table 15.3.
| wop/S |
| word/DGJMS |
| wordage/S |
| wordbook |
| wordily |
| wordless/P |
Ispell modes
mnoGoSearch for Windows can store ispell files either in SQL database or load ispell files from disk. To choose ispell mode use "IspellMode text" or "IspellMode db". Note that "db" mode works with supported SQL database only and does not work with built-in database. "text" mode is faster than "db" in search time, while "db" is faster than "text" in indexing. You may configure indexer to use "text" mode and search front-end to use "db" mode (after having properly imported the same ispell files) at the same time.
Text ispell mode
To make mnoGoSearch support text ispell mode you must specify Affix and Spell commands in both indexer options and search.htm file. The format of commands:
Affix <lang> <ispell affixes file name>
Spell <lang> <ispell dictionary file name>
The first parameter of both commands is two letters language abbrevation. The second one is filename. File names are relative to mnoGoSearch /etc directory. Absolute paths can be also specified.
Note
Simultaneous loading of several languages is supported.
For example:
Affix en en.aff
Spell en en.dict
Affix de de.aff
Spell de de.dict
will load ispell support for both English and German languages.
Database ispell mode
You can import ispell data in SQL database using "indexer" program. After that indexer, search.exe and PHP front-end can be switched to use SQL to normalize words by specifying "IspellMode db" in search.htm and indexer options. "IspellMode db" gives faster results at search time.
To import ispell files go to ISpell tab of mnoGoSearch and press Append button. In the opened dialog select type of the file you wish to add (affix if you wish to append an affixes file or dict if you are appending a dictionary file.)
Note
Ispell files supplied with various languages ispell packages may have different extensions, not *.aff and *.dict only. Select All files in Open file dialog to select those files.
Customizing dictionary
It is possible that several rare words are found in your site which are not in ispell dictionaries. You may create the list of such words in plain text file with the following format (one word per line):
rare.dict:
----------
webmaster
intranet
.......
www
http
---------
You may also use ispell flags in this file (for ispell flags refer to ISpell documentation). This will allow not to write the same word with different endings to the rare words file, for example "webmaster" and "webmasters". You may choose the word which has the same changing rules from existing ispell dictionary and just to copy flags from it. For example, English dictionary has this line:
postmaster/MS
So, webmaster with MS flags will be probably OK:
webmaster/MS
Then copy this file to /etc directory of mnoGoSearch and add this file by Spell command in ISpell tab of mnoGoSearch:
During next reindexing using of all documents new words will be considered as words with correct spelling. The only really incorrect words will remain.
|