Re: [PROPOSAL] Shared Ispell dictionaries - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PROPOSAL] Shared Ispell dictionaries
Date
Msg-id 90c04bf7-fcab-8b0b-b461-43b46bf79970@2ndquadrant.com
Whole thread Raw
In response to Re: [PROPOSAL] Shared Ispell dictionaries  (Andres Freund <andres@anarazel.de>)
Responses Re: [PROPOSAL] Shared Ispell dictionaries  (Arthur Zakirov <a.zakirov@postgrespro.ru>)
List pgsql-hackers

On 03/19/2018 07:07 PM, Andres Freund wrote:
> On 2018-03-19 14:52:34 +0100, Tomas Vondra wrote:
>> On 03/19/2018 02:34 AM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2018-03-19 01:52:41 +0100, Tomas Vondra wrote:
>>>> I do agree with that. We have a working well-understood dsm-based
>>>> solution, addressing the goals initially explained in this thread.
>>>
>>> Well, it's also awkward and manual to use. I do think that's
>>> something we've to pay attention to.
>>>
>>
>> Awkward in what sense?
> 
> You've to manually configure a setting that can only be set at server
> start.  You can't set it as big as necessary because it might use up
> memory better used for other things.  It needs the full space for
> dictionaries even if the majority of it never will be needed.  All of
> those aren't needed in an mmap world.
> 

Which is not quite true, because that's not what the patch does.

Each dictionary is loaded into a separate dsm segment when needed, which
is then stored in a dhash table. So most of what you wrote is not really
true - the patch does not pre-allocate the space, and the setting might
be set even after server start (it's not defined like that currently,
but that should be trivial to change).

> 
>> So, I'm not at all convinced the mmap approach is actually better
>> than the dsm one. And I believe that if we come up with a good way
>> to automate some of the tasks, I don't see why would that be
>> possible in the mmap and not dsm.
> 
> To me it seems we'll end up needing a heck of a lot more code that
> the OS already implements if we do it ourselves.
> 

Like what? Which features do you expect to need much more code?

The automated reloading will need a fairly small amount of code - the
main issue is deciding when to reload, and as I mentioned before that's
more complicated than you seem to believe. In fact, it may not even be
possible - there's no way to decide if all files are already updated.
Currently we kinda ignore that, on the assumption that dictionaries
change only rarely. We may do the same thing and reload the dict if at
least one file changes. In any case, the amount of code is trivial.

In fact, it may be more complicated in the mmap case - how do you update
a dictionary that is already mapped to multiple processes?

The eviction is harder - I'll give you that. But then again, I'm not
sure the mmap approach is really what we want here - it seems better to
evict the whole dictionary, than some random pages from many of them.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PROPOSAL] Shared Ispell dictionaries
Next
From: Alexander Kuzmenkov
Date:
Subject: Re: IndexJoin memory problem using spgist and boxes