Re: [PROPOSAL] Shared Ispell dictionaries - Mailing list pgsql-hackers

From Arthur Zakirov
Subject Re: [PROPOSAL] Shared Ispell dictionaries
Date
Msg-id 20180319191830.GA16075@arthur.localdomain
Whole thread Raw
In response to Re: [PROPOSAL] Shared Ispell dictionaries  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Mon, Mar 19, 2018 at 07:40:54PM +0100, Tomas Vondra wrote:
> 
> 
> On 03/19/2018 07:07 PM, Andres Freund wrote:
> > You've to manually configure a setting that can only be set at server
> > start.  You can't set it as big as necessary because it might use up
> > memory better used for other things.  It needs the full space for
> > dictionaries even if the majority of it never will be needed.  All of
> > those aren't needed in an mmap world.
> > 
> 
> Which is not quite true, because that's not what the patch does.
> 
> Each dictionary is loaded into a separate dsm segment when needed, which
> is then stored in a dhash table. So most of what you wrote is not really
> true - the patch does not pre-allocate the space, and the setting might
> be set even after server start (it's not defined like that currently,
> but that should be trivial to change).

Oh, it's true. I had plans to fix it but somehow I forgot to allow to change
max_shared_dictionaries_size GUC via pg_reload_conf(). I'll fix it and
will send new version of the patch.

> > To me it seems we'll end up needing a heck of a lot more code that
> > the OS already implements if we do it ourselves.
> > 
> 
> Like what? Which features do you expect to need much more code?
> 
> The automated reloading will need a fairly small amount of code - the
> main issue is deciding when to reload, and as I mentioned before that's
> more complicated than you seem to believe. In fact, it may not even be
> possible - there's no way to decide if all files are already updated.
> Currently we kinda ignore that, on the assumption that dictionaries
> change only rarely. We may do the same thing and reload the dict if at
> least one file changes. In any case, the amount of code is trivial.
> 
> In fact, it may be more complicated in the mmap case - how do you update
> a dictionary that is already mapped to multiple processes?
> 
> The eviction is harder - I'll give you that. But then again, I'm not
> sure the mmap approach is really what we want here - it seems better to
> evict the whole dictionary, than some random pages from many of them.

Agree. mmap approach requires same code plus code to handle cache files,
which will be mapped into memory. In mmap approach we need to solve same
issues we face and more. Also we need somehow automatically reload
dictionaries in both cases.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


pgsql-hackers by date:

Previous
From: Alexander Kuzmenkov
Date:
Subject: Re: IndexJoin memory problem using spgist and boxes
Next
From: Tom Lane
Date:
Subject: Re: Compile error while building postgresql 10.3