Home > mailing lists

Re: [PROPOSAL] Shared Ispell dictionaries - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: [PROPOSAL] Shared Ispell dictionaries
Date	January 14, 2018 00:33:14
Msg-id	404f06c5-c7a3-3313-1a64-25f08ba089ad@2ndquadrant.com Whole thread Raw
In response to	Re: [PROPOSAL] Shared Ispell dictionaries (Arthur Zakirov <a.zakirov@postgrespro.ru>)
Responses	Re: [PROPOSAL] Shared Ispell dictionaries
List	pgsql-hackers

Tree view

On 01/13/2018 04:22 PM, Arthur Zakirov wrote:
> Hello,
> 
> Thank you Tomas for your review.
> 
> On Sat, Jan 13, 2018 at 03:25:55AM +0100, Tomas Vondra wrote:
>> allocate memory in the buildCxt. What about adding tmpstrdup to copy a
>> string into the context? I admit this is mostly nitpicking though.
> 
> ... snip ...>
>> 8) One more thing - I've noticed that the hash table uses this key:
>> ...
>> That is, full paths to the two files, and I'm not sure that's a very
>> good idea. Firstly, it's a bit wasteful (1kB per path). But more
>> importantly it means all dictionaries referencing the same files will
>> share the same chunk of shared memory - not only within a single
>> database, but across the whole cluster. That may lead to surprising
>> behavior, because e.g. unloading a dictionary in one database will
>> affect dictionaries in all other databases referencing the same files.
> 
> Hm, indeed. It's worth to use only file names instead full paths. And
> it is good idea to use more information besides file names. It can be
> Oid of a database and Oid of a namespace maybe, because a
> dictionary can be created in different schemas.
> 

I doubt using filenames (without the directory paths) solves anything,
really. The keys still have to be MAXPGPATH because someone could create
very long filename. But I don't think memory consumption is such a big
deal, really. With 1000 dictionaries it's still just ~2MB of data, which
is negligible compared to the amount of memory saved by sharing the
dictionaries.

Not sure if we really need to add the database/schema OIDs. I mentioned
the unexpected consequences (cross-db sharing) but maybe that's a
feature we should keep (it reduces memory usage). So perhaps this should
be another CREATE TEXT SEARCH DICTIONARY parameter, allowing sharing the
dictionary with other databases?

Aren't we overengineering this?

> I think your proposals may be implemented in several patches, so they can
> be applyed independently but consistently. I suppose I will prepare new
> version of the patch with fixes and with initial design of new functions
> and commands soon.
> 

Yes, splitting patches into smaller, more focused bits is a good idea.

BTW the current patch fails to document the dictionary sharing. It only
mentions it when describing the shared_dictionaries GUC. IMHO the right
place for additional details is

https://www.postgresql.org/docs/10/static/textsearch-dictionaries.html

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Michael Meskes
Date: 13 January 2018, 23:17:45
Subject: Re: [PATCH] ECPG bug fix in preproc when indicator struct isshorter than record struct

From: Stephen Frost
Date: 14 January 2018, 00:52:26
Subject: Re: proposal: alternative psql commands quit and exit

Re: [PROPOSAL] Shared Ispell dictionaries - Mailing list pgsql-hackers

Previous

Next