Hello, hackers!
Introduction
------------
I'm going to implement a patch which will store Ispell dictionaries in a shared memory.
There is an extension shared_ispell [1], developed by Tomas Vondra. But it is a bad candidate for including into
contrib.
Because it should know a lot of information about IspellDict struct to copy it into a shared memory.
Why
---
Shared Ispell dictionary gives the following improvements:
- consume less memory - Ispell dictionary loads into memory for every backends and requires for some dictionaries more
than100Mb
- there is no overhead during first call of a full text search function (such as to_tsvector(), to_tsquery())
Implementation
--------------
It is necessary to change all structures related with IspellDict: SPNode, AffixNode, AFFIX, CMPDAffix, IspellDict
itself.They all shouldn't use pointers for this reason. Others are used only during dictionary building.
It would be good to store in a shared memory StopList struct too.
All fields of IspellDict struct, which are used only during dictionary building, will be move into new IspellDictBuild
todecrease needed shared memory size. And they are going to be released by buildCxt.
Each dictionary will be stored in its own dsm segment. Structures for regular expressions won't be stored in a shared
memory.They are compiled for every backend.
The patch will be ready and added into the 2018-03 commitfest.
Thank you for your attention. Any thoughts?
1 - github.com/tvondra/shared_ispell or github.com/postgrespro/shared_ispell
--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company