Re: [sqlsmith] PANIC: failed to add BRIN tuple - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: [sqlsmith] PANIC: failed to add BRIN tuple
Date
Msg-id 20160523164114.GA391570@alvherre.pgsql
Whole thread Raw
In response to [sqlsmith] PANIC: failed to add BRIN tuple  (Andreas Seltenreich <seltenreich@gmx.de>)
List pgsql-hackers
Andreas Seltenreich wrote:
> There was one instance of this PANIC when testing with the regression db
> of master at 50e5315.
> 
> ,----
> | WARNING:  specified item offset is too large
> | PANIC:  failed to add BRIN tuple
> | server closed the connection unexpectedly
> `----
> 
> It is reproducible with the query below on this instance only.

Hm, so this is an over-eager check.  As I understand, what seems to have
happened is that the page was filled with up to 10 tuples, then tuples
0-8 were removed, probably moved to other pages.  When this update runs,
it needs to update the remaining tuple, which is a normal thing for BRIN
to do.  But BRIN doesn't want the item number to change, because it's
referenced from the "revmap"; so it removes the item and then wants to
insert it again.  But bufpage.c is not accustomed to having callers want
to put items beyond what's the last currently used item plus one, so it
raises a warning and returns without doing the insert.  This drives BRIN
crazy.

I tried simply removing the "return InvalidOffsetNumber" line from that
block (so that it still throws the warning but it does execute the index
insert), and everything seems to behave correctly.  I suppose a simple
fix would be to add a flag to PageAddItem() and skip this block in that
case, but that would break the ABI in 9.5.  I'm not real sure what's a
good fix yet.  Maybe a new routine specific to the needs of BRIN is
called for.

I would also like to come up with a way to have this scenario be tested
by a new regression test.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Parallel safety tagging of extension functions
Next
From: Tom Lane
Date:
Subject: Re: Latent cache flush hazard in RelationInitIndexAccessInfo