Re: [PROPOSAL] Shared Ispell dictionaries - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PROPOSAL] Shared Ispell dictionaries
Date
Msg-id b9e470f7-01b5-bf8c-78fc-5633ebc6fa2e@2ndquadrant.com
Whole thread Raw
In response to Re: [PROPOSAL] Shared Ispell dictionaries  (Arthur Zakirov <a.zakirov@postgrespro.ru>)
Responses Re: [PROPOSAL] Shared Ispell dictionaries  (Arthur Zakirov <a.zakirov@postgrespro.ru>)
List pgsql-hackers
On 1/21/19 12:51 PM, Arthur Zakirov wrote:
> On 21.01.2019 02:43, Tomas Vondra wrote:
>> On 1/20/19 11:21 PM, Andres Freund wrote:
>>> On 2019-01-20 23:15:35 +0100, Tomas Vondra wrote:
>>>> Thanks. I've reviewed v17 today and I haven't discovered any new issues
>>>> so far. If everything goes fine and no one protests, I plan to get it
>>>> committed over the next week or so.
>>>
>>> There doesn't seem to be any docs about what's needed to be able to take
>>> advantage of shared dicts, and how to prevent them from permanently
>>> taking up a significant share of memory.
>>>
>>
>> Yeah, those are good points. I agree the comments might be clearer, but
>> essentially ispell dictionaries are shared and everything else is not.
>>
>> As for the memory consumption / unloading dicts - I agree that's
>> something we need to address. There used to be a way to specify memory
>> limit and ability to unload dictionaries explicitly, but both features
>> have been ditched. The assumption was that UNLOAD would be introduced
>> later, but that does not seem to have happened.
> 
> I'll try to implement the syntax, you suggested earlier:
> 
> ALTER TEXT SEARCH DICTIONARY x UNLOAD/RELOAD
> 
> The main point here is that UNLOAD/RELOAD can't release the memory
> immediately, because some other backend may pin a DSM.
> 
> The second point we should consider (I think) - how do we know which
> dictionary should be unloaded. There was such function earlier, which
> was removed. But what about adding an information in the "\dFd" psql's
> command output? It could be a column which shows is a dictionary loaded.
> 

The UNLOAD capability is probably a good start, but it's entirely manual
and I wonder if it's putting too much burden on the user. I mean, the
user has to realize the dictionaries are using a lot of shared memory,
has to decide which to unload, and then has to do UNLOAD on it. That's
not quite straightforward, especially if there's no way to determine
which dictionaries are currently loaded and how much memory they use :-(

Of course, the problem is not exactly new - we don't show dictionaries
already loaded into private memory. The only thing we have is "unload"
capability by closing the connection. OTOH the memory consumption should
be much lower thanks to using shared memory. So I think the patch is an
improvement even in this regard.

I wonder if we could devise some simple cache eviction policy. We don't
have any memory limit GUC anymore, but maybe we could use unload
dictionaries that were unused for sufficient amount of time (a couple of
minutes or so). Of course, the question is when exactly would it happen
(it seems far too expensive to invoke on each dict access, and it should
happen even when the dicts are not accessed at all).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Jesper Pedersen
Date:
Subject: Re: speeding up planning with partitions
Next
From: Oleg Bartunov
Date:
Subject: Re: jsonpath