Re: [PROPOSAL] Shared Ispell dictionaries - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [PROPOSAL] Shared Ispell dictionaries
Date
Msg-id ada73d07-6c88-acab-7e31-75b2eb724d24@2ndquadrant.com
Whole thread Raw
In response to Re: [PROPOSAL] Shared Ispell dictionaries  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PROPOSAL] Shared Ispell dictionaries  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On 3/24/18 9:56 PM, Tom Lane wrote:
> Arthur Zakirov <a.zakirov@postgrespro.ru> writes:
>> [ v10 patch versions ]
> 
> I took a quick look through this.  I agree with the comments about
> mmap-ability not being something we should insist on now, and maybe
> not ever.  However, in order to keep our options open, it seems like
> we should minimize the amount of API we expose that's based on the
> current implementation.  That leads me to the following thoughts:
> 
> * I cannot imagine a use-case for setting max_shared_dictionaries_size
> to anything except "unlimited".  If it's not that, and you exceed it,
> then subsequent backends load private copies of the dictionary, making
> your memory situation rapidly worse not better.  I think we should lose
> that GUC altogether and just load dictionaries automatically.
> 

Introduction of that limit is likely my fault. It came from from an
extension I wrote a long time ago, but back then it was a necessity
because we did not have DSM. So in retrospect I agree with you - it's
not particularly useful and we should ditch it.

Arthur, let this be a lesson for you! You have to start fight against
bogus feature requests from other people ;-)

> * Similarly, I see no point in a "sharable" option on individual
> dictionaries, especially when there's only one allowed setting for
> most dictionary types.  Let's lose that too.
> 

I'm not so sure. Imagine you have a small number of dictionaries that
are used frequently, and then many that are used only once in a while.

A good example is a system handling documents in various languages,
serving "local" customers in a few local languages most of the time, but
then once in a while there's a request in another language.

In that case it makes sense to keep the frequently used ones in shared
memory all the time, and the rest load as needed (and throw it away when
the backend disconnects). Which is exactly what the 'shareable' option
is about ...

> * And that leads us to not particularly need a view telling which
> dictionaries are loaded, either.  It's just an implementation detail
> that users don't need to worry about.
> 

Not so sure about this either. Of course, if we remove the memory limit,
it will be predictable which dictionaries are loaded in shared memory
and which are in the backends.

> This does beg the question of whether we need a way to flush dictionary
> contents that's short of restarting the server (or short of dropping and
> recreating the dictionary).  I'm not sure, but even if we do, none of
> the above is necessary for that.
> 

Ummm, I don't follow. Why would be flushing a dictionary similar to
restarting a server? It's certainly simpler to do a RELOAD on an
existing dictionary than having to do DROP+CREATE. I would not be
surprised if DROP+CREATE was significantly harder process-wise for many
admins (I mean, more approvals).

>
> I do think it's required that changing the dictionary's options with
> ALTER TEXT SEARCH DICTIONARY automatically cause a reload; but if that's
> happening with this patch, I don't see where.  (It might work to use
> the combination of dictionary OID and TID of the dictionary's pg_ts_dict
> tuple as the lookup key for shared dictionaries.  Oh, and have you
> thought about the possibility of conflicting OIDs in different DBs?
> Probably the database OID has to be part of the key, as well.)
> 

Not sure.

> Also, the scheme for releasing the dictionary DSM during
> RemoveTSDictionaryById is uncertain and full of race conditions:
> the DROP might roll back later, or someone might come along and
> start using the dictionary (causing a fresh DSM load) before the
> DROP commits and makes the dictionary invisible to other sessions.
> I don't think that either of those are necessarily fatal objections,
> but there needs to be some commentary there explaining what happens.
> 

Actually, I think that's an issue - such race condition might easily
leak the shared memory forever (because the new dictionary will get a
different OID etc.). It probably is not happening very often, because
dictionaries are not dropped very often. But it needs fixing I think.

> BTW, I was going to complain that this patch alters the API for
> dictionary template init functions without any documentation updates;
> but then I realized that there isn't any documentation to update.
> That pretty well sucks, but I suppose it's not the job of this patch
> to improve that situation.  Still, you could spend a bit more effort on
> the commentary in ts_public.h in 0002, because that commentary is as
> close to an API spec as we've got.
> 

Yeah :-(


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Using base backup exclusion filters to reduce data transferred with pg_rewind
Next
From: Tom Lane
Date:
Subject: Re: [PROPOSAL] Shared Ispell dictionaries