Re: [PROPOSAL] Shared Ispell dictionaries - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PROPOSAL] Shared Ispell dictionaries
Date
Msg-id 9c27384d-4538-15be-0e94-0a67a52e7b0b@2ndquadrant.com
Whole thread Raw
In response to Re: [PROPOSAL] Shared Ispell dictionaries  (Andres Freund <andres@anarazel.de>)
Responses Re: [PROPOSAL] Shared Ispell dictionaries  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 03/19/2018 02:34 AM, Andres Freund wrote:
> Hi,
> 
> On 2018-03-19 01:52:41 +0100, Tomas Vondra wrote:
>> I do agree with that. We have a working well-understood dsm-based
>> solution, addressing the goals initially explained in this thread.
> 
> Well, it's also awkward and manual to use. I do think that's
> something we've to pay attention to.
> 

Awkward in what sense?

I don't think the manual aspect is an issue. Currently we have no way to
reload the dictionary, except for restarting all the backends. I don't
see that as a particularly convenient solution. Also, this is pretty
much how the shared_ispell extension works, although you might argue
that was more due to the limitation of how shared memory could be used
in extensions before DSM was introduced. In any case, I've never heard
complaints about this aspect of the extension.

There are about two things that might be automated - reloading of
dictionaries and evicting them when hitting the memory limit. I have
tried to implement that in the shared_ispell dictionary but it's a bit
more complicated than it looks.

For example, it seems obvious to reload the dictionary when the file
timestamp changes. But in fact there are three files - dict, affixes,
stopwords. So will you reload when a single file changes? All of them?
Keep in mind that the new version of dictionary may use different
affixes, so a reload at the wrong moment may result in broken result.

> 
>> I wonder how much of this patch would be affected by the switch 
>> from dsm to mmap? I guess the memory limit would get mostly 
>> irrelevant (mmap would rely on the OS to page the memory in/out 
>> depending on memory pressure), and so would the UNLOAD/RELOAD 
>> commands (because each backend would do it's own mmap).
> 
> Those seem fairly major.
> 

I'm not sure I'd say those are major. And you might also see the lack of
these capabilities as negative points for the mmap approach.

So, I'm not at all convinced the mmap approach is actually better than
the dsm one. And I believe that if we come up with a good way to
automate some of the tasks, I don't see why would that be possible in
the mmap and not dsm.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: inserts into partitioned table may cause crash
Next
From: "David G. Johnston"
Date:
Subject: Re: Problems with Error Messages wrt Domains, Checks