Re: Vacuum/visibility is busted - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: Vacuum/visibility is busted
Date
Msg-id CABOikdPr8-29NEta1grOi7=FyVVtc5gA5NLtOX6O6M=gLDsDqA@mail.gmail.com
Whole thread Raw
In response to Vacuum/visibility is busted  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Vacuum/visibility is busted  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Re: Vacuum/visibility is busted  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-hackers
On Thu, Feb 7, 2013 at 11:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> While stress testing Pavan's 2nd pass vacuum visibility patch, I realized
> that vacuum/visibility was busted.  But it wasn't his patch that busted it.
> As far as I can tell, the bad commit was in the range
> 692079e5dcb331..168d3157032879
>
> Since a run takes 12 to 24 hours, it will take a while to refine that
> interval.
>
> I was testing using the framework explained here:
>
> http://www.postgresql.org/message-id/CAMkU=1xoA6Fdyoj_4fMLqpicZR1V9GP7cLnXJdHU+iGgqb6WUw@mail.gmail.com
>
> Except that I increased  JJ_torn_page to 8000, so that autovacuum has a
> chance to run to completion before each crash; and I turned off archive_mode
> as it was not relevant and caused annoying noise.  As far as I know,
> crashing is entirely irrelevant to the current problem, but I just used and
> adapted the framework I had at hand.
>
> A tarball  of the data directory is available below, for those who would
> like to do a forensic inspection.  The table jjanes.public.foo is clearly in
> violation of its unique index.

The xmins of all the duplicate tuples look dangerously close to 2^31.
I wonder if XID wrap around has anything to do with it.

Index scans do not return any duplicates and you need to force a seq
scan to see them. Assuming that the page level VM bit might be
corrupted, I tried to REINDEX the table to see if it complains of
unique key violations, but that crashes the server with

TRAP: FailedAssertion("!(((bool) ((root_offsets[offnum - 1] !=
((OffsetNumber) 0)) && (root_offsets[offnum - 1] <= ((OffsetNumber)
(8192 / sizeof(ItemIdData)))))))", File: "index.c", Line: 2482)

Will look more into it, but thought this might be useful for others to
spot the problem.

Thanks,
Pavan

P.S BTW, you would need to connect as user "jjanes" to a database
"jjanes" to see the offending table.

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [COMMITTERS] pgsql: Fast promote mode skips checkpoint at end of recovery.
Next
From: Heikki Linnakangas
Date:
Subject: Re: [COMMITTERS] pgsql: Fast promote mode skips checkpoint at end of recovery.