Thread: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", File: "nbtsearch.c", Line: 89)

I've got a customer who was having a problem with a backend running away
with memory. It would hit 46G before finally being running the box
completely out of memory. It didn't appear to be related to hashjoin/agg
or pending triggers, so I had them recompile with debug and assert
turned on. A bit after restarting we get:

TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", File: "nbtsearch.c", Line: 89)

Looking through the code I see that's something to do with indexes, but
I'm not sure what. Is this likely a corrupted index? If so, is there
some way I could identify which index?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


"Jim C. Nasby" <jnasby@pervasive.com> writes:
> TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", File: "nbtsearch.c", Line: 89)

> Looking through the code I see that's something to do with indexes, but
> I'm not sure what. Is this likely a corrupted index?

Sounds that way.

> If so, is there some way I could identify which index?

gdb the core file and do "p rel->rd_rel->relname" in the _bt_search
stack frame (which is not going to be the top of stack but should be
close to the top).
        regards, tom lane


On Wed, Oct 26, 2005 at 04:48:22PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", File: "nbtsearch.c", Line: 89)
> 
> > Looking through the code I see that's something to do with indexes, but
> > I'm not sure what. Is this likely a corrupted index?
> 
> Sounds that way.

Is this something that should be logged better than it is? Is there any
value in trying to save the index/table that's busted?

This is 8.0.3, btw.

> > If so, is there some way I could identify which index?
> 
> gdb the core file and do "p rel->rd_rel->relname" in the _bt_search
> stack frame (which is not going to be the top of stack but should be
> close to the top).

Yeah, looks like no core files yet (they would be in $PGDATA somewhere,
correct?), so I had them hack their startup script to include ulimit -c
unlimited. Luckily this is pretty reproducable, so hopefully we'll have
a core soon.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Wed, Oct 26, 2005 at 04:48:22PM -0400, Tom Lane wrote:
>> Sounds that way.

> Is this something that should be logged better than it is?

We don't even know what it is yet, so that question seems a bit premature.

> Is there any value in trying to save the index/table that's busted?

It'd probably be worth making a copy for forensic purposes before you
REINDEX it.
        regards, tom lane


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", File: "nbtsearch.c", Line: 89)

From
Martijn van Oosterhout
Date:
On Wed, Oct 26, 2005 at 04:03:41PM -0500, Jim C. Nasby wrote:
> Is this something that should be logged better than it is? Is there any
> value in trying to save the index/table that's busted?

Unfortuatly, it's one of those Assert(expr) lines, they don't generally
give the opportunity for extra debug data. They're usually not even
compiled in...

> Yeah, looks like no core files yet (they would be in $PGDATA somewhere,
> correct?), so I had them hack their startup script to include ulimit -c
> unlimited. Luckily this is pretty reproducable, so hopefully we'll have
> a core soon.

Well, you know the query that generates it? EXPLAIN should tell you the
indexes it's using, so you could just REINDEX those...

Or REINDEX them all... :)

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

On Wed, Oct 26, 2005 at 11:14:17PM +0200, Martijn van Oosterhout wrote:
> On Wed, Oct 26, 2005 at 04:03:41PM -0500, Jim C. Nasby wrote:
> > Is this something that should be logged better than it is? Is there any
> > value in trying to save the index/table that's busted?
> 
> Unfortuatly, it's one of those Assert(expr) lines, they don't generally
> give the opportunity for extra debug data. They're usually not even
> compiled in...
> 
> > Yeah, looks like no core files yet (they would be in $PGDATA somewhere,
> > correct?), so I had them hack their startup script to include ulimit -c
> > unlimited. Luckily this is pretty reproducable, so hopefully we'll have
> > a core soon.
> 
> Well, you know the query that generates it? EXPLAIN should tell you the
> indexes it's using, so you could just REINDEX those...

Well, since query logging only logs when a query finishes...

> Or REINDEX them all... :)

242G database. That would take quite a while... :)
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Wed, Oct 26, 2005 at 11:14:17PM +0200, Martijn van Oosterhout wrote:
>> Well, you know the query that generates it?

> Well, since query logging only logs when a query finishes...

Don't forget to look at debug_query_string when you get the core file.
        regards, tom lane


On Wed, Oct 26, 2005 at 05:47:15PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > On Wed, Oct 26, 2005 at 11:14:17PM +0200, Martijn van Oosterhout wrote:
> >> Well, you know the query that generates it?
> 
> > Well, since query logging only logs when a query finishes...
> 
> Don't forget to look at debug_query_string when you get the core file.

Didn't know about that, it will certainly help.

There's nothing in the FAQ's about what to look for in a coredump is
there?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


On Wed, Oct 26, 2005 at 04:48:22PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)", File: "nbtsearch.c", Line: 89)
> 
> > Looking through the code I see that's something to do with indexes, but
> > I'm not sure what. Is this likely a corrupted index?
> 
> Sounds that way.
> 
> > If so, is there some way I could identify which index?
> 
> gdb the core file and do "p rel->rd_rel->relname" in the _bt_search
> stack frame (which is not going to be the top of stack but should be
> close to the top).

Reproduced the crash, but still no core file... where exactly should it
have been put? Would in be outside of $PGDATA?

Is there any way to verify the ulimit setings that the backend is
running under?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


"Jim C. Nasby" <jnasby@pervasive.com> writes:
> Reproduced the crash, but still no core file... where exactly should it
> have been put? Would in be outside of $PGDATA?

In 8.0 I'd expect to find it in $PGDATA/base/DBOID/core (or possibly
core.NNNN).  There are some platforms like Darwin that tend to put
core files in a fixed directory such as /cores, though.
        regards, tom lane


On Wed, Oct 26, 2005 at 06:38:45PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > Reproduced the crash, but still no core file... where exactly should it
> > have been put? Would in be outside of $PGDATA?
> 
> In 8.0 I'd expect to find it in $PGDATA/base/DBOID/core (or possibly
> core.NNNN).  There are some platforms like Darwin that tend to put
> core files in a fixed directory such as /cores, though.

Grr... /etc/profile had
ulimit -S -c 0 > /dev/null 2>&1

Is there any way to verify what limits are in place for a running
backend?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


On Wed, Oct 26, 2005 at 06:06:19PM -0500, Jim C. Nasby wrote:
> On Wed, Oct 26, 2005 at 06:38:45PM -0400, Tom Lane wrote:
> > "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > > Reproduced the crash, but still no core file... where exactly should it
> > > have been put? Would in be outside of $PGDATA?
> > 
> > In 8.0 I'd expect to find it in $PGDATA/base/DBOID/core (or possibly
> > core.NNNN).  There are some platforms like Darwin that tend to put
> > core files in a fixed directory such as /cores, though.
> 
> Grr... /etc/profile had
> ulimit -S -c 0 > /dev/null 2>&1
> 
> Is there any way to verify what limits are in place for a running
> backend?

Also, is an assert guaranteed to dump core?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


"Jim C. Nasby" <jnasby@pervasive.com> writes:
>> Is there any way to verify what limits are in place for a running
>> backend?

Damifino.

> Also, is an assert guaranteed to dump core?

Yup ... at least, it will call abort().
        regards, tom lane


"Jim C. Nasby" <jnasby@pervasive.com> writes:
> Is there anything in particular you'd like to see from the index file? I
> made a copy of it before reindexing...

Could you send me the whole file (off-list)?
        regards, tom lane


"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Wed, Oct 26, 2005 at 09:29:23PM -0400, Tom Lane wrote:
>> Could you send me the whole file (off-list)?

> Ok, will send URL as soon as I have it from client.

Well, the answer is that there's nothing wrong with that index except
that four consecutive pages near the end (32K total) have been zeroed
out :-(

In the situation where _bt_search descends into one of these pages,
we'd have the following behavior:

* P_RIGHTMOST() appears true because btpo_next is zero, therefore
_bt_moveright does nothing.

* P_ISLEAF() appears false because the flags word is zero, so we
don't fall out of the loop.

* _bt_binsrch finds high < low and returns low (ie, 1) rather than complaining.

* If Asserts are on then the internal ItemIdIsUsed assert in PageGetItem
triggers, resulting in the behavior mentioned in $subject.

* If not, then we stupidly fetch a zero block number out of the
nonexistent item, and iterate to page zero, ie, the metapage.
The above observations still hold true for the metapage, so
it's an infinite loop --- or would be except that we're building
a stack entry each time around the loop, and so we gradually
exhaust memory.  This matches the other behavior Jim saw.

Bottom line is that index searches probably ought to have some
non-Assert defenses against zeroed-out pages.  Obviously we can't
expect to catch every flavor of data corruption, but this particular
one has been seen before...

BTW, Jim, any thoughts about how the index got corrupted?  Have you
had any crashes on that machine lately?
        regards, tom lane


On Thu, Oct 27, 2005 at 11:53:01PM -0400, Tom Lane wrote:
> BTW, Jim, any thoughts about how the index got corrupted?  Have you
> had any crashes on that machine lately?

Write-through cache on drive array that's not battery backed. Plus, the
backend has been crashing on a sig 11 about once a week for
who-knows-how-long. You do the math...

BTW, now that the backend is compiled with --enable-debug I hope to find
out the reason for the random crashes.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Gavin Sherry <swm@linuxworld.com.au> writes:
> Definately. I've seen faulty hardware somehow zero blocks where I would
> have expected random data. I wonder if we can test with PageIsNew(), which
> is very inexpensive. The question is: what do we do when we detect this?

I think erroring out with a message along the line of "corrupted data in
index foo" is plenty.  The recovery action is easy: reindex.  So all we
have to do is not crash and deliver a useful error message.

We know that the page has already gotten by PageHeaderIsValid, so either
it's in reasonable condition or it's all zeroes.  So a quick check for
some nonzero header fields is enough ... PageIsNew is as good as
anything.

The next question is what's the minimal number of places we have to add
it to to cover all paths in the index modules?

> That being said, is there any reason where don't extend the file with a
> PageInit()'d block instead of a zero'd file?

Yeah: this scenario is exactly it.  PageInit doesn't make for a valid
index page.
        regards, tom lane


"Jim C. Nasby" <jnasby@pervasive.com> writes:
> On Thu, Oct 27, 2005 at 11:53:01PM -0400, Tom Lane wrote:
>> BTW, Jim, any thoughts about how the index got corrupted?  Have you
>> had any crashes on that machine lately?

> Write-through cache on drive array that's not battery backed. Plus, the
> backend has been crashing on a sig 11 about once a week for
> who-knows-how-long. You do the math...

My guess is that this could not have been caused by a backend crash.
I did some analysis and found that one of the four pages had been a
level-1 internal page, not a leaf page (there is a sibling link from
a level-1 page and a downlink from the level-2 root, QED).  It's
very hard to see how to explain the condition of the index as a
result of a single backend crash, and even if you posit one crash
for each page, it's tough to believe that there wouldn't be additional
corruption elsewhere (eg, from incomplete page split operations).  But
AFAICT there's nothing at all wrong anywhere else.  And if there were
multiple crashes, why are the affected pages contiguous?

I think the data was dropped at the disk or filesystem level.  If you
have had power failures then that's certainly not hard to believe.

> BTW, now that the backend is compiled with --enable-debug I hope to find
> out the reason for the random crashes.

Yeah, that is an interesting question.
        regards, tom lane


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

From
Gavin Sherry
Date:
On Thu, 27 Oct 2005, Tom Lane wrote:

> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > On Wed, Oct 26, 2005 at 09:29:23PM -0400, Tom Lane wrote:
> >> Could you send me the whole file (off-list)?
>
> > Ok, will send URL as soon as I have it from client.
>
> Well, the answer is that there's nothing wrong with that index except
> that four consecutive pages near the end (32K total) have been zeroed
> out :-(

[snip]

> Bottom line is that index searches probably ought to have some
> non-Assert defenses against zeroed-out pages.  Obviously we can't
> expect to catch every flavor of data corruption, but this particular
> one has been seen before...

Definately. I've seen faulty hardware somehow zero blocks where I would
have expected random data. I wonder if we can test with PageIsNew(), which
is very inexpensive. The question is: what do we do when we detect this?

>
> BTW, Jim, any thoughts about how the index got corrupted?  Have you
> had any crashes on that machine lately?

Have spoken with Jim on IRC, he says that there have been several crashes
recently due to a faulty disk array. I guess the zeroing could be an
outcome of the faulty disk. I wonder if the crash the faulty disk resulted
in could have been caused some where around mdextend() where we create a
zero'd page but before we could have written out the initialised page.

If this happened 4 times in a row it could account for the problem. It
does seem a bit unlikely thought.

That being said, is there any reason where don't extend the file with a
PageInit()'d block instead of a zero'd file?

Thanks,

Gavin


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

From
"Jim C. Nasby"
Date:
On Fri, Oct 28, 2005 at 02:26:31PM +1000, Gavin Sherry wrote:
> Have spoken with Jim on IRC, he says that there have been several crashes
> recently due to a faulty disk array. I guess the zeroing could be an
> outcome of the faulty disk. I wonder if the crash the faulty disk resulted
> in could have been caused some where around mdextend() where we create a
> zero'd page but before we could have written out the initialised page.

Just to clarify, there's no evidence that the array is faulty. I do know
that they were using write-back with a non-battery-backed cache though.

What has been happening is periodic random crashes, around 1 a week. I
now have a good core for one, as well as an assert:

TRAP: FailedAssertion("!(shared->page_number[slotno] == pageno &&
shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)", File:
"slru.c", Line: 308)

I haven't looked at that code yet, so I have no idea what that actually
means. Let me know what info y'all would like to see out of the core.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",

From
Tom Lane
Date:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> What has been happening is periodic random crashes, around 1 a week. I
> now have a good core for one, as well as an assert:

> TRAP: FailedAssertion("!(shared->page_number[slotno] == pageno &&
> shared->page_status[slotno] == SLRU_PAGE_READ_IN_PROGRESS)", File:
> "slru.c", Line: 308)

> I haven't looked at that code yet, so I have no idea what that actually
> means. Let me know what info y'all would like to see out of the core.

The whole contents of *shared and the local variables of
SimpleLruReadPage would be good for starters.

Also, what PG version is this exactly, again?
        regards, tom lane