Re: Breakage with VACUUM ANALYSE + partitions - Mailing list pgsql-bugs

From Andres Freund
Subject Re: Breakage with VACUUM ANALYSE + partitions
Date
Msg-id 20160325164927.c522xd7nw2a74yu5@alap3.anarazel.de
Whole thread Raw
In response to Re: Breakage with VACUUM ANALYSE + partitions  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-bugs
On 2016-03-25 12:02:05 -0400, Robert Haas wrote:
> On Fri, Mar 25, 2016 at 8:41 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
> > On Thu, Mar 24, 2016 at 9:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> On Thu, Mar 24, 2016 at 12:59 AM, Haribabu Kommi
> >> <kommi.haribabu@gmail.com> wrote:
> >>> So further operations on the table uses the already constructed smgr relation
> >>> and treats that there are RELSEG_SIZE number of blocks in the page and try
> >>> to do the scan. But there are 0 pages in the table thus it produces the error.
> >>>
> >>> The issue doesn't occur from another session. Because of this reason only
> >>> if we do only vacuum operation, the error not occurred.
> >>
> >> Yeah, I had a suspicion that this might have to do with invalidation
> >> messages based on Thom's description, but I think we still need to
> >> track down which commit is at fault.
> >
> > I could reproduce the failure on Linux, not on OSX, and bisecting the
> > failure, the first bad commit is this one:
> > commit: 428b1d6b29ca599c5700d4bc4f4ce4c5880369bf
> > author: Andres Freund <andres@anarazel.de>
> > date: Thu, 10 Mar 2016 17:04:34 -0800
> > Allow to trigger kernel writeback after a configurable number of writes.
> >
> > The failure is a little bit sporadic, based on my tests 1/2 runs out
> > of 10 could pass, so one good commit was recognized as such after
> > passing the SQL sequence sent by Thom 5 times in a row. I also did
> > some manual tests and those are pointing to this commit as well.
> >
> > I am adding Fabien and Andres in CC for some feedback.
>
> Gosh, that's surprising.  I wonder if that just revealed an underlying
> issue rather than creating it.

I think that might be the case, but I'm not entirely sure yet. It
appears to me that the current backend - others don't show the problem -
still has the first segment of pgbench_accounts open (in the md.c
mdfdvec sense); likely because there were remaining flush
requests. Thus, when mdnblocks is called to get the size of the relation
we return the size of the first segment (131072) plus the size of the
second segment (0, doesn't exist). That then leads to this error.

I don't really understand yet how the "open segment" thing happens.

pgsql-bugs by date:

Previous
From: Robert Haas
Date:
Subject: Re: Breakage with VACUUM ANALYSE + partitions
Next
From: Thomas Munro
Date:
Subject: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5);