F.65. shared_ispell

The shared_ispell module provides a shared ispell dictionary, i.e. a dictionary that's stored in shared segment. The traditional ispell implementation means that each session initializes and stores the dictionary on it's own, which means a lot of CPU/RAM is wasted.

This extension allocates an area in shared segment (you have to choose the size in advance) and then loads the dictionary into it when it's used for the first time.

F.65.1. Functions

The functions provided by the shared_ispell module are shown in Table F.50.

Table F.50. shared_ispell Functions

FunctionReturnsDescription
shared_ispell_reset()void Resets the dictionaries (e.g. so that you can reload the updated files from disk). The sessions that already use the dictionaries will be forced to reinitialize them.
shared_ispell_mem_used()int Returns a value of used memory of the shared segment by loaded shared dictionaries in bytes.
shared_ispell_mem_available()int Returns a value of available memory of the shared segment.
shared_ispell_dicts()setof(dict_name varchar, affix_name varchar, words int, affixes int, bytes int) Returns a list of dictionaries loaded in the shared segment.
shared_ispell_stoplists()setof(stop_name varchar, words int, bytes int) Returns a list of stopwords loaded in the shared segment.

F.65.2. GUC Parameters

shared_ispell.max_size (int)

Defines the maximum size of the shared segment. This is a hard limit, the shared segment is not extensible and you need to set it so that all the dictionaries fit into it and not much memory is wasted.

F.65.3. Using the dictionary

The module needs to allocate space in the shared memory segment. So add this to the config file (or update the current values):

# libraries to load
shared_preload_libraries = 'shared_ispell'

# config of the shared memory
shared_ispell.max_size = 32MB

To find out how much memory you actually need, use a large value (e.g. 200MB) and load all the dictionaries you want to use. Then use the shared_ispell_mem_used() function to find out how much memory was actually used (and set the shared_ispell.max_size GUC variable accordingly).

Don't set it exactly to that value, leave there some free space, so that you can reload the dictionaries without changing the GUC max_size limit (which requires a restart of the DB). Something like 512kB should be just fine.

The extension defines a shared_ispell template that you may use to define custom dictionaries. E.g. you may do this:

CREATE TEXT SEARCH DICTIONARY english_shared (
    TEMPLATE = shared_ispell,
    DictFile = en_us,
    AffFile = en_us,
    StopWords = english
);

CREATE TEXT SEARCH CONFIGURATION public.english_shared
    ( COPY = pg_catalog.simple );

ALTER TEXT SEARCH CONFIGURATION english_shared
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
                    word, hword, hword_part
    WITH english_shared, english_stem;

We can test created configuration:

SELECT * FROM ts_debug('english_shared', 'abilities');
   alias   |   description   |   token   |         dictionaries          |   dictionary   |  lexemes  
-----------+-----------------+-----------+-------------------------------+----------------+-----------
 asciiword | Word, all ASCII | abilities | {english_shared,english_stem} | english_shared | {ability}
(1 row)

Or you can update your own text search configuration. For example, you have the public.english dictionary. You can update it to use the shared_ispell template:

ALTER TEXT SEARCH CONFIGURATION public.english
    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
                    word, hword, hword_part
    WITH english_shared, english_stem;

F.65.4. Author

Tomas Vondra , Prague, Czech Republic