Re: GIN data corruption bug(s) in 9.6devel - Mailing list pgsql-hackers

From Noah Misch
Subject Re: GIN data corruption bug(s) in 9.6devel
Date
Msg-id 20160412064322.GA1818418@tornado.leadboat.com
Whole thread Raw
In response to Re: GIN data corruption bug(s) in 9.6devel  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Thu, Apr 07, 2016 at 05:53:54PM -0700, Jeff Janes wrote:
> On Thu, Apr 7, 2016 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Jeff Janes <jeff.janes@gmail.com> writes:
> >> To summarize the behavior change:
> >
> >> In the released code, an inserting backend that violates the pending
> >> list limit will try to clean the list, even if it is already being
> >> cleaned.  It won't accomplish anything useful, but will go through the
> >> motions until eventually it runs into a page the lead cleaner has
> >> deleted, at which point it realizes there is another cleaner and it
> >> stops.  This acts as a natural throttle to how fast insertions can
> >> take place into an over-sized pending list.
> >
> > Right.
> >
> >> The proposed change removes that throttle, so that inserters will
> >> immediately see there is already a cleaner and just go back about
> >> their business.  Due to that, unthrottled backends could add to the
> >> pending list faster than the cleaner can clean it, leading to
> >> unbounded growth in the pending list and could cause a user backend to
> >> becoming apparently unresponsive to the user, indefinitely.  That is
> >> scary to backpatch.
> >
> > It's scary to put into HEAD, either.  What if we simply don't take
> > that specific behavioral change?  It doesn't seem like this is an
> > essential part of fixing the bug as you described it.  (Though I've
> > not read the patch, so maybe I'm just missing the connection.)
> 
> There are only 3 fundamental options I see, the cleaner can wait,
> "help", or move on.
> 
> "Helping" is what it does now and is dangerous.
> 
> Moving on gives the above-discussed unthrottling problem.
> 
> Waiting has two problems.  The act of waiting will cause autovacuums
> to be canceled, unless ugly hacks are deployed to prevent that.   If
> we deploy those ugly hacks, then we have the problem that a user
> backend will end up waiting on an autovacuum to finish the cleaning,
> and the autovacuum is taking its sweet time due to
> autovacuum_vacuum_cost_delay.

Teodor, this thread has been quiet for four days, and the deadline to fix this
open item expired 23 hours ago.  Do you have a new plan for fixing it?



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Updated backup APIs for non-exclusive backups
Next
From: Amit Kapila
Date:
Subject: Re: Choosing parallel_degree