Re: Vacuum/visibility is busted - Mailing list pgsql-hackers

From Jeff Janes
Subject Re: Vacuum/visibility is busted
Date
Msg-id CAMkU=1zqb0VTxbfRQqDNy4Zr5X8m0nuTa-CC6EDAO9yitpXUpw@mail.gmail.com
Whole thread Raw
In response to Re: Vacuum/visibility is busted  (Pavan Deolasee <pavan.deolasee@gmail.com>)
Responses Re: Vacuum/visibility is busted  (Jeff Janes <jeff.janes@gmail.com>)
Re: Vacuum/visibility is busted  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On Thu, Feb 7, 2013 at 12:55 AM, Pavan Deolasee
<pavan.deolasee@gmail.com> wrote:
> On Thu, Feb 7, 2013 at 11:09 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> While stress testing Pavan's 2nd pass vacuum visibility patch, I realized
>> that vacuum/visibility was busted.  But it wasn't his patch that busted it.
>> As far as I can tell, the bad commit was in the range
>> 692079e5dcb331..168d3157032879
>>
>> Since a run takes 12 to 24 hours, it will take a while to refine that
>> interval.
>>
>> I was testing using the framework explained here:
>>
>> http://www.postgresql.org/message-id/CAMkU=1xoA6Fdyoj_4fMLqpicZR1V9GP7cLnXJdHU+iGgqb6WUw@mail.gmail.com
>>
>> Except that I increased  JJ_torn_page to 8000, so that autovacuum has a
>> chance to run to completion before each crash; and I turned off archive_mode
>> as it was not relevant and caused annoying noise.  As far as I know,
>> crashing is entirely irrelevant to the current problem, but I just used and
>> adapted the framework I had at hand.
>>
>> A tarball  of the data directory is available below, for those who would
>> like to do a forensic inspection.  The table jjanes.public.foo is clearly in
>> violation of its unique index.
>
> The xmins of all the duplicate tuples look dangerously close to 2^31.
> I wonder if XID wrap around has anything to do with it.
>
> Index scans do not return any duplicates and you need to force a seq
> scan to see them. Assuming that the page level VM bit might be
> corrupted, I tried to REINDEX the table to see if it complains of
> unique key violations, but that crashes the server with
>
> TRAP: FailedAssertion("!(((bool) ((root_offsets[offnum - 1] !=
> ((OffsetNumber) 0)) && (root_offsets[offnum - 1] <= ((OffsetNumber)
> (8192 / sizeof(ItemIdData)))))))", File: "index.c", Line: 2482)

I don't see the assertion failure myself.  If I do REINDEX INDEX it
gives a duplicate key violation, and if I do REINDEX TABLE or REINDEX
DATABASE I get claimed success.

This is using either current head (ab0f7b6) or 168d315 as binaries to
start up the cluster.

Cheers,

Jeff



pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: proposal: ANSI SQL 2011 syntax for named parameters
Next
From: Tom Lane
Date:
Subject: Re: split rm_name and rm_desc out of rmgr.c