Re: Index corruption - Mailing list pgsql-hackers

From Marc Munro
Subject Re: Index corruption
Date
Msg-id 1151632823.3913.97.camel@bloodnok.com
Whole thread Raw
In response to Re: Index corruption  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Index corruption  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, 2006-06-29 at 21:47 -0400, Tom Lane wrote:
> One easy thing that would be worth trying is to build with
> --enable-cassert and see if any Asserts get provoked during the
> failure case.  I don't have a lot of hope for that, but it's
> something that would require only machine time not people time.

I'll try this tomorrow.

> A couple other things to try, given that you can provoke the failure
> fairly easily:
>
> 1. In studying the code, it bothers me a bit that P_NEW is the same as
> InvalidBlockNumber.  The intended uses of P_NEW appear to be adequately
> interlocked, but it's fairly easy to see how something like this could
> happen if there are any places where InvalidBlockNumber is
> unintentionally passed to ReadBuffer --- that would look like a P_NEW
> call and it *wouldn't* be interlocked.  So it would be worth changing
> P_NEW to "(-2)" (this should just take a change in bufmgr.h and
> recompile) and adding an "Assert(blockNum != InvalidBlockNumber)"
> at the head of ReadBufferInternal().  Then rebuild with asserts enabled
> and see if the failure case provokes that assert.

I'll try this too.

> 2. I'm also eyeing this bit of code in hio.c:
>
>         /*
>          * If the FSM knows nothing of the rel, try the last page before
>          * we give up and extend.  This avoids one-tuple-per-page syndrome
>          * during bootstrapping or in a recently-started system.
>          */
>         if (targetBlock == InvalidBlockNumber)
>         {
>             BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
>
>             if (nblocks > 0)
>                 targetBlock = nblocks - 1;
>         }
>
> If someone else has just extended the relation, it's possible that this
> will allow a process to get to the page before the intended extender has
> finished initializing it.  AFAICT that's not harmful because the page
> will look like it has no free space ... but it seems a bit fragile.
> If you dike out the above-mentioned code, can you still provoke the
> failure?

By dike out, you mean remove?  Please confirm and I'll try it.

> A different line of attack is to see if you can make a self-contained
> test case so other people can try to reproduce it.  More eyeballs on the
> problem are always better.

Can't really see this being possible.  This is clearly a very unusual
problem and without similar hardware I doubt that anyone else will
trigger it.  We ran this system happily for nearly a year on the
previous kernel without experiencing this problem (tcp lockups are a
different matter).  Also the load is provided by a bunch of servers and
robots simulating rising and falling load.

> Lastly, it might be interesting to look at the WAL logs for the period
> leading up to a failure.  This would give us an idea of what was
> happening concurrently with the processes that seem directly involved.

Next time we reproduce it, I'll take a copy of the WAL files too.

__
Marc

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Index corruption
Next
From: Tom Lane
Date:
Subject: Re: Index corruption