Re: Potential GIN vacuum bug - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Potential GIN vacuum bug
Date
Msg-id CAMkU=1zsAe+WhaWwTMTU8LQgFSVfox7Bf+tpVDmPq6C-+VkUug@mail.gmail.com
Whole thread Raw
In response to Re: Potential GIN vacuum bug  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Potential GIN vacuum bug  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, Aug 22, 2015 at 11:25 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Tue, Aug 18, 2015 at 8:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 17, 2015 at 5:41 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> User backends attempt to take the lock conditionally, because otherwise they
> would cause an autovacuum already holding the lock to cancel itself, which
> seems quite bad.
>
> Not that this a substantial behavior change, in that with this code the user
> backends which find the list already being cleaned will just add to the end
> of the pending list and go about their business.  So if things are added to
> the list faster than they can be cleaned up, the size of the pending list
> can increase without bound.
>
> Under the existing code each concurrent user backend will try to clean the
> pending list at the same time.  The work doesn't parallelize, so doing this
> is just burns CPU (and possibly consuming up to maintenance_work_mem  for
> *each* backend) but it does server to throttle the insertion rate and so
> keep the list from growing without bound.
>
> This is just a proof-of-concept patch, because I don't know if this approach
> is the right approach.

I'm not sure if this is the right approach, but I'm a little wary of
involving the heavyweight lock manager in this.  If pending list
cleanups are frequent, this could involve a lot of additional lock
manager traffic, which could be bad for performance. 
 
Even if they are
infrequent, it seems like it would be more natural to handle this
without some regime of locks and pins and buffer cleanup locks on the
buffers that are storing the pending list, rather than a heavyweight
lock on the whole relation.  But I am just waving my hands wildly
here.

I also thought of a buffer clean up lock on the pending list head buffer to represent the right to do a clean up.  But with the proviso that once you have obtained the clean up lock, you can then drop the exclusive buffer content lock and continue to hold the conceptual lock just by maintaining the pin.  I think that this would be semantically correct, but backends doing a cleanup would have to get the lock conditionally, and I think you would have too many chances for false failures where it bails out when the other party simply holds a pin.  I guess I could implement it and see how it fairs in my test case.

Attached is a patch to deal with this without the heavyweight locks. 

I realized that using the clean up lock on the meta page, rather than the pending head page, would be easier to implement as a pin was already held on the meta page throughout, and the pending head page can change during the cleanup if we need multiple passes.

Also, I think the clean up lock on the metapage should actually be easier.  All queries need to visit the pending head, and they hold it long enough to check all the keys (possibly hundreds, checked with arbitrarily slow SQL functions) on that page.  The metapage is only checked for a few variables which are C types.

I thought of checking for metadata->head == InvalidBlockNumber with a sharelock before getting the clean-up lock and then again after, but decide against it as the user backends already check that immediately before calling this function, and wouldn't call it if there was no pending list as-of that check.

I exchange the exclusive context lock given to us by the LockBufferForCleanup for a share content lock, so as to not hold the exclusive lock over the IO possibly needed to read the pending head page into buffers.  I don't know that this is actually a win.

This fixed the same problem I was seeing that was fixed by the previous heavy-weight lock patch.

I still prefer the heavy-weight approach.  The buffer clean up lock for vacuuming seems fragile to start with, and abusing it for other purposes doesn't improve on that.

Whichever approach is taken, more work is needed on the comments.  And the code that currently checks for concurrent cleanup and bails out needs to be changed to throw errors or something instead.  But I don't want to make too many changes until I know which approach to take, and whether it will be back-patched.

Cheers,

Jeff

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: SimpleTee flush
Next
From: Tom Lane
Date:
Subject: Re: Potential GIN vacuum bug