Re: [PATCHES] GIN improvements - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCHES] GIN improvements
Date
Msg-id 13239.1216848409@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCHES] GIN improvements  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-hackers
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> It's a mess:

> These are rather severe problems.  Maybe there's a better solution, but
> perhaps it would be good enough to lock out concurrent access to the
> index while the bulkinsert procedure is working.

Ugh...

The idea I was toying with was to not allow GIN scans to "stop" on
pending-insertion pages; rather, they should suck out all the matching
tuple IDs into backend-local memory as fast as they can, and then return
the TIDs to the caller one at a time from that internal array.  Then,
when the scan is later visiting the main part of the index, it could
check each matching TID against that array to see if it'd already
returned the TID.  (So it might be an idea to sort the TID array after
gathering it, to make those subsequent checks fast via binary search.)

This would cost in backend-local memory, of course, but hopefully not
very much.  The advantages are the elimination of the deadlock risk
from scan-blocks-insertcleanup-blocks-insert, and fixing the race
condition when a TID previously seen in the pending list is moved to
the main index.  There were still a number of locking issues to fix
but I think they're all relatively easy to deal with.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: PostgreSQL extensions packaging
Next
From: "Dann Corbit"
Date:
Subject: Re: Research/Implementation of Nested Loop Join optimization