Re: [PATCHES] GIN improvements - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCHES] GIN improvements
Date
Msg-id 8993.1216843701@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCHES] GIN improvements  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: [PATCHES] GIN improvements  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: [PATCHES] GIN improvements  (Teodor Sigaev <teodor@sigaev.ru>)
List pgsql-hackers
Teodor Sigaev <teodor@sigaev.ru> writes:
> Updated: http://www.sigaev.ru/misc/fast_insert_gin-0.9.gz

Here is the GIN fast-insert patch back again.  Changes:

* Sync with CVS HEAD
* Clean up documentation and some of the code comments
* Fix up custom reloptions code
* Suppress some compiler warnings

I didn't get much further than that because I got discouraged after
looking at the locking issues around the pending-insertions list.
It's a mess:

* shiftList() holds an exclusive lock on metapage throughout its run,
which means that it's impossible for two of them to run concurrently.
So why bother with "concurrent deletion" detection?

* shiftList does LockBufferForCleanup, which means that it can be blocked
for an indefinitely long time by a concurrent scan, and since it's holding
exclusive lock on metapage no new scans or insertions can start meanwhile.
This is not only horrid from a performance standpoint but it very probably
can result in deadlocks --- which will be deadlocks on LWLocks and thus
not even detected by the system.

* GIN index scans release lock and pin on one pending-list page before
acquiring pin and lock on the next, which means there's a race condition:
shiftList could visit and delete the next page before we get to it,
because there's a window where we're holding no buffer lock at all.
I think this isn't fatal in itself, since presumably the data in the next
page has been moved into the main index and we can scan it later, but the
scan code isn't checking whether the page has been deleted out from under
it.

* It seems also possible that once a list page has been marked
GIN_DELETED, it could be re-used for some other purpose before a
scan-in-flight reaches it -- reused either as a regular index page or as a
new list page.  Neither case is being defended against.  It might be that
the new-list-page case isn't a problem, or it might not.

* There is a bigger race condition, which is that after a scan has
returned a tuple from a pending page, vacuum could move the index entry
into the main index structure, and then that same scan could return that
same index entry a second time.  This is a no-no, and I don't see any easy
fix.

I haven't really finished reviewing this code, but I'm going to bounce it
back to you to see if you can solve the locking problems.  Unless that can
be made safe there is no point doing any more work on this patch.

            regards, tom lane


Attachment

pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: [PATCHES] odd output in restore mode
Next
From: "Manoel Henrique"
Date:
Subject: Research/Implementation of Nested Loop Join optimization