Re: Potential GIN vacuum bug - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Potential GIN vacuum bug
Date
Msg-id CAMkU=1yj_r5vdBcOxgNVjhHwK=O3FDBmoxuOvRYc2Ec-aEgWQw@mail.gmail.com
Whole thread Raw
In response to Re: Potential GIN vacuum bug  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Potential GIN vacuum bug  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Tue, Aug 18, 2015 at 8:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Aug 17, 2015 at 5:41 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> User backends attempt to take the lock conditionally, because otherwise they
> would cause an autovacuum already holding the lock to cancel itself, which
> seems quite bad.
>
> Not that this a substantial behavior change, in that with this code the user
> backends which find the list already being cleaned will just add to the end
> of the pending list and go about their business.  So if things are added to
> the list faster than they can be cleaned up, the size of the pending list
> can increase without bound.
>
> Under the existing code each concurrent user backend will try to clean the
> pending list at the same time.  The work doesn't parallelize, so doing this
> is just burns CPU (and possibly consuming up to maintenance_work_mem  for
> *each* backend) but it does server to throttle the insertion rate and so
> keep the list from growing without bound.
>
> This is just a proof-of-concept patch, because I don't know if this approach
> is the right approach.

I'm not sure if this is the right approach, but I'm a little wary of
involving the heavyweight lock manager in this.  If pending list
cleanups are frequent, this could involve a lot of additional lock
manager traffic, which could be bad for performance. 

I don't think 10 cleanups a second should be a problem (and most of those would probably fail to acquire the lock, but I don't know if that would make a difference).  If there were several hundred per second, I think you would have bigger problems than traffic through the lock manager.  In that case, it is time to either turn off fastupdate, or increase the pending list size.  

As a mini-vacuum, it seems natural to me to hold a lock of the same nature as a regular vacuum, but just on the one index involved rather than the hole table.
 
Even if they are
infrequent, it seems like it would be more natural to handle this
without some regime of locks and pins and buffer cleanup locks on the
buffers that are storing the pending list, rather than a heavyweight
lock on the whole relation.  But I am just waving my hands wildly
here.

I also thought of a buffer clean up lock on the pending list head buffer to represent the right to do a clean up.  But with the proviso that once you have obtained the clean up lock, you can then drop the exclusive buffer content lock and continue to hold the conceptual lock just by maintaining the pin.  I think that this would be semantically correct, but backends doing a cleanup would have to get the lock conditionally, and I think you would have too many chances for false failures where it bails out when the other party simply holds a pin.  I guess I could implement it and see how it fairs in my test case.

Cheers,

Jeff

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: (full) Memory context dump considered harmful
Next
From: Jeff Janes
Date:
Subject: Re: Test code is worth the space