Re: [PROPOSAL] Shared Ispell dictionaries - Mailing list pgsql-hackers

From Arthur Zakirov
Subject Re: [PROPOSAL] Shared Ispell dictionaries
Date
Msg-id CAKNkYnzNJ1NoWaEUcCPDFjfKar1VPeNwxaZ0WyCvxoY+dhWEkQ@mail.gmail.com
Whole thread Raw
In response to Re: [PROPOSAL] Shared Ispell dictionaries  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Tomas Vondra wrote:
On 03/31/2018 12:42 PM, Arthur Zakirov wrote:
> Hello all,
>
> I'd like to add new optional function to text search template named fini
> in addition to init() and lexize(). It will be called by
> RemoveTSDictionaryById() and AlterTSDictionary(). dispell_fini() will
> call ts_dict_shmem_release().
>
> It doesn't change segments leaking situation. I think it makes text
> search API more transparent.
>

If it doesn't actually solve the problem, why add it? I don't see a
point in adding functions for the sake of transparency, when it does not
in fact serve any use cases.

It doesn't solve the problem. But it brings more clearness, if a dictionary requested shared location then it should release/unpin it. There are no such scenario yet, but someone might want to release not only shared segment but also other private data.

Can't we handle the segment-leaking by adding some sort of tombstone?

It is interesting that there are such tombstones already, without the patch. TSDictionaryCacheEntry entries aren't deleted after DROP, they are just marked isvalid = false.
 
For example, imagine that instead of removing the hash table entry we
mark it as 'dropped'. And after that, after the lookup we would know the
dictionary was removed, and the backends would load the dictionary into
their private memory.

Of course, this could mean we end up having many tombstones in the hash
table. But those tombstones would be tiny, making it less painful than
potentially leaking much more memory for the dictionaries.

Now actually Isn't guaranteed that the hash table entry will be removed. Even if refcnt is 0. So I think I should remove refcnt and entries won't be removed.

There are no big problems with leaking now. Memory may leak only if a dictionary was dropped or altered and there is no text search workload anymore and the backend still alive. Because next using of text search functions will unpin segments used before for invalid dictionaries (isvalid == false). Also the segment is unpinned if the backend terminates. The segment is destroyed when all interested processes unpin the segment (as Tom noticed), the hash table entry becomes tombstone.

I hope I described clear.
 
Also, I wonder if we might actually remove the dictionaries after a
while, e.g. based on XID. Imagine that we note the XID of the
transaction removing the dictionary, or perhaps XID of the most recent
running transaction. Then we could use this to decide if all running
transactions actually see the DROP, and we could remove the tombstone.

Maybe autovacuum should work here too :) It is joke of course. I'm not very aware of removing dead tuples, but I think here is similar case.


--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] [PATCH] Incremental sort
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] A design for amcheck heapam verification