Thread: right sibling is not next child

right sibling is not next child

From

"Kevin Grittner"

Date:

06 April 2006, 16:46:28

Apologies if this is a duplicate, but my original post stalled and I
noticed I had omitted the postgres version, which you will want.

I'm reporting this as a PostgreSQL bug because it involves an index
corruption.  I can't see any other way our application should be able to
corrupt an index.  I will attach the tail of the log when the corruption
was detected (and the postmaster shut itself down), as well as the
subsequent attempt to start.  Fortunately we run our web site off of a
farm of four database servers, so we are taking one of the others out of
the mix, stopping postmaster, and copying its data directory over to
this machine for recovery, so we don't need advice on that aspect of
things; but, we'd like to do what we can to help track down the cause to
prevent a recurrence.  We have renamed the data directory to make room
for recovery at the normal location, but otherwise the failing data
directory structure is unmodified.

For context, this is running on Windows 2003 Server.  Eight Xeon box,
no HT, 6 GB RAM, 13 drive 15,000 RPM RAID5 array through battery backed
controller for everything.  This database is about 180 GB with about 300
tables.  We are running 8.1.3 modified with a patch we have submitted
(pending review last I saw) to implement the standard_conforming_strings
TODO.  We have autovacuum running every ten seconds because of a few
very small tables with very high update rates, and we have a scheduled
VACUUM ANALYZE VERBOSE every night.  It appears that last night's vacuum
found the problem, which the previous night's vacuum didn't.  We had
some event which started at 14:25 yesterday which persisted until we
restarted the middle tier at 15:04.  The symptom was that a fraction of
the queries which normally run in a few ms were timing out on a 20
second limit.  pg_locks showed no blocking.  We've been getting episodes
with these symptoms occassionally, but they have only lasted a minute or
two; this duration was unusual.  We haven't identified a cause.  One odd
thing is that with the number of queries per second that we run, the
number of timeouts during an episode is too small to support the notion
that _all_ similar queries are failing.

How best to proceed?

-Kevin

Attachment

Re: right sibling is not next child

From

Tom Lane

Date:

06 April 2006, 17:40:26

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> [2006-04-06 02:19:57.460 ] 3848 <postgres bigbird 127.0.0.1(3944)> PANIC:  right sibling is not next child in
"Panel_pkey"

This should be repeatable by re-attempting a VACUUM, right?  Please find
out which page exactly it's unhappy about (either gdb the crash or add a
printout of the "parent" variable to the elog call in nbtpage.c), then
pg_filedump the index and look to see what the index contains.

            regards, tom lane

Re: right sibling is not next child

From

"Kevin Grittner"

Date:

06 April 2006, 17:45:39

Right now the postmaster refuses to start.  What is the best way to get
past that to try what you suggest?

[2006-04-06 07:22:50.347 ] 3984 LOG:  database system was interrupted
while in recovery at 2006-04-06 02:19:59 Central Daylight Time
[2006-04-06 07:22:50.347 ] 3984 HINT:  This probably means that some
data is corrupted and you will have to use the last backup for
recovery.
[2006-04-06 07:22:50.347 ] 3984 LOG:  checkpoint record is at
F6/50052F10
[2006-04-06 07:22:50.347 ] 3984 LOG:  redo record is at F6/50052F10;
undo record is at 0/0; shutdown FALSE
[2006-04-06 07:22:50.347 ] 3984 LOG:  next transaction ID: 313381941;
next OID: 2321597
[2006-04-06 07:22:50.347 ] 3984 LOG:  next MultiXactId: 1; next
MultiXactOffset: 0
[2006-04-06 07:22:50.347 ] 3984 LOG:  database system was not properly
shut down; automatic recovery in progress
[2006-04-06 07:22:50.347 ] 3984 LOG:  redo starts at F6/50052F58
[2006-04-06 07:22:50.347 ] 4076 <bigbird bigbird 127.0.0.1(2236)>
FATAL:  the database system is starting up
[2006-04-06 07:22:50.347 ] 4064 <bigbird bigbird 127.0.0.1(2235)>
FATAL:  the database system is starting up
[2006-04-06 07:22:50.378 ] 3984 PANIC:  heap_clean_redo: no block

-Kevin


>>> On Thu, Apr 6, 2006 at 12:40 pm, in message
<25774.1144345220@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> [2006- 04- 06 02:19:57.460 ] 3848 <postgres bigbird 127.0.0.1(3944)>
PANIC:  right
> sibling is not next child in "Panel_pkey"
>
> This should be repeatable by re- attempting a VACUUM, right?  Please
find
> out which page exactly it's unhappy about (either gdb the crash or
add a
> printout of the "parent" variable to the elog call in nbtpage.c),
then
> pg_filedump the index and look to see what the index contains.
>
>             regards, tom lane

Re: right sibling is not next child

From

Tom Lane

Date:

06 April 2006, 17:57:35

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Right now the postmaster refuses to start.  What is the best way to get
> past that to try what you suggest?

> [2006-04-06 07:22:50.378 ] 3984 PANIC:  heap_clean_redo: no block

Hm, did this start happening immediately after the other problem?
That would suggest that you've got worse problems than just a corrupt
index.  You weren't by any chance running with full_page_writes = off
were you?

You could get past the startup failure with pg_resetxlog, but it's not
clear whether you'd have a consistent database afterward.  What I'd
suggest first is saving a copy of the entire $PGDATA tree for forensic
purposes (not to mention being able to go back to that state if you need
to).

Is there any chance of letting someone else have a look at the database
contents?  Otherwise you're going to have to do your own sleuthing to
figure out what went wrong ...

            regards, tom lane

Re: right sibling is not next child

From

"Kevin Grittner"

Date:

06 April 2006, 18:16:05

>>> On Thu, Apr 6, 2006 at 12:57 pm, in message
<25913.1144346246@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Right now the postmaster refuses to start.  What is the best way to
get
>> past that to try what you suggest?
>
>> [2006- 04- 06 07:22:50.378 ] 3984 PANIC:  heap_clean_redo: no block
>
> Hm, did this start happening immediately after the other problem?

This started happening on the first attempt to start the postmaster
after the other error, which left the postmaster down, apparently after
a failed restart attempt.

> That would suggest that you've got worse problems than just a
corrupt
> index.  You weren't by any chance running with full_page_writes =
off
> were you?

Yes we were.  Apparently I have misunderstood the implications of this.
 Somehow I had convinced myself that this setting was relatively safe in
our environment, due to our battery-backed controllers.  I'd convinced
myself, after reading carefully through the documentation of this
setting, that I would be OK as long as that functioned correctly, and
have problems regardless of this setting if it didn't.  If you show me
where I went wrong, maybe I can suggest a patch to the docs to prevent
others from going down the wrong path in this regard.  (Of course, maybe
it's all there and I just had a bad day when I thought this through.)

> You could get past the startup failure with pg_resetxlog, but it's
not
> clear whether you'd have a consistent database afterward.  What I'd
> suggest first is saving a copy of the entire $PGDATA tree for
forensic
> purposes

We already have this forensic copy and a replacement production copy on
this box.  I think we'll need to copy to another box to get a second
forensic copy, to avoid risking an out-of-space condition.  That can be
done, but it'll take a few hours.

> (not to mention being able to go back to that state if you need
> to).

That's not an issue for production purposes.

> Is there any chance of letting someone else have a look at the
database
> contents?

There is a lot of data in the database which is confidential by law.
I'd have to jump through a lot of hoops to get anyone to even consider
letting me ship it off site.  If you're asking whether you could access
in to our site, that might be arranged.

-Kevin

Re: right sibling is not next child

From

Tom Lane

Date:

06 April 2006, 18:27:26

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You weren't by any chance running with full_page_writes = off
>> were you?

> Yes we were.  Apparently I have misunderstood the implications of this.

So had we all :-(.  It just plain doesn't work in 8.1.*, and will be
disabled in 8.1.4 --- see discussion last week.  It might or might not
get resurrected in 8.2, depending on how messy it seems to be to fix.

Anyway, that explains your "heap_clean_redo: no block" failure.  I think
you're stuck risking a pg_resetxlog to try to get back into the
database.  If that results in a hopelessly corrupt database, we can try
modifying the WAL replay code to not consider this a fatal error, and
see if that produces anything we can use for debugging.  I'm glad this
isn't your only copy of the database ...

> There is a lot of data in the database which is confidential by law.
> I'd have to jump through a lot of hoops to get anyone to even consider
> letting me ship it off site.  If you're asking whether you could access
> in to our site, that might be arranged.

It sounds like the DB is too big to consider shipping anywhere anyway.
As long as you're comfortable doing stuff like pg_filedump and modifying
the code to get more debug info, we can proceed without getting into the
question of remote access.

            regards, tom lane

Re: right sibling is not next child

From

"Kevin Grittner"

Date:

06 April 2006, 18:45:39

>>> On Thu, Apr 6, 2006 at  1:26 pm, in message
<26154.1144348014@sss.pgh.pa.us>,
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> You weren't by any chance running with full_page_writes = off
>>> were you?
>
>> Yes we were.  Apparently I have misunderstood the implications of
this.
>
> So had we all :- (.  It just plain doesn't work in 8.1.*, and will
be
> disabled in 8.1.4 ---  see discussion last week.

Dang!  I've not been able to keep up with the lists, and I missed that
thread.  I'll change that setting as soon as possible.

> Anyway, that explains your "heap_clean_redo: no block" failure.  I
think
> you're stuck risking a pg_resetxlog to try to get back into the
> database.  If that results in a hopelessly corrupt database, we can
try
> modifying the WAL replay code to not consider this a fatal error,
and
> see if that produces anything we can use for debugging.

Will do.  Before I do that, though, is it worth making a copy?  (The
down side is primarily the delay of copying it.)

> I'm glad this isn't your only copy of the database ...

We go beyond being a belt-and-suspenders shop.  Think staples and
glue-gun, too.  ;-)

This is one of four copies of what is redundant data to start with, and
we have backups.  So recovery is no sweat, but we got emails about
timeouts from the public during this incident, so the underlying bug is
important to us.

> As long as you're comfortable doing stuff like pg_filedump and
modifying
> the code to get more debug info, we can proceed without getting into
the
> question of remote access.

OK.  If you later think we do need to go that direction, send me an
email off-list.

-Kevin

Re: right sibling is not next child

From

Tom Lane

Date:

06 April 2006, 18:48:28

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Anyway, that explains your "heap_clean_redo: no block" failure.  I think
>> you're stuck risking a pg_resetxlog to try to get back into the
>> database.

> Will do.  Before I do that, though, is it worth making a copy?  (The
> down side is primarily the delay of copying it.)

Yeah, I think you'd better --- pg_resetxlog is destructive, and if we
find the database unusable for debugging afterward, we'll need that
copy.

            regards, tom lane