Thread: Hanging backends and possible index corruption

Hanging backends and possible index corruption

From
Bernd Helmle
Date:
We are currently analyzing an issue at one of our customers PostgreSQL 
database.

The current version is 9.1.6 (update to 9.1.7 is scheduled for next monday, 
no downtime possible before). It runs on POWER7 (pSeries 740) on an RHEL6.3 
64-bit LPAR. The packages are built from PGDG SVN sources, no special 
tweaks added. We saw no hardware related errors on this machine, nor any 
crashes.

What currently happens on this machine are hanging statements (SELECTs and 
INSERTs occasionally) with 100% CPU. After some investigation it turned out 
that the corresponding backends are seeking within an index file over and 
over again in a loop. Looking into the hanging queries i've recognized 
certain keys which seems to have the problem, other keys used in the WHERE 
condition run smoothly. Turning off index and bitmap index scans caused the 
suspicious keys to return results, too.

So i've saved the index file (normal BTree index with a single bigint 
column), did a REINDEX and the problem was gone. Looking at the index file 
with pg_filedump and pgbtreecheck from Alvaro gave me the following output:

pgbtreecheck gives warnings about pages' parents and then loops visiting 
the same pages over and over again:

NOTICE: fast root: block 290 at level 2
NOTICE: setting page parents
WARNING: block 12709 already had a parent (8840); new parent is 12177
WARNING: block 12710 already had a parent (12439); new parent is 10835
NOTICE: done setting parents
NOTICE: Checking forward scan of level 0, starting at block 1

-- loop starts

WARNING: right sibling 12710 does not point left to me (11680); points to 
10924 instead

Looking into the relevant pages and their prev and next pointers give me 
the following:

pg_filedump -i -R 11680 ~/tmp/100252789 | grep Blocks Blocks: Previous (12651)  Next (12710)  Level (0)  CycleId (0)

pg_filedump -i -R 10924 ~/tmp/100252789 | grep Blocks Blocks: Previous (10923)  Next (12710)  Level (0)  CycleId (0)

pg_filedump -i -R 12710 ~/tmp/100252789 | grep Blocks Blocks: Previous (10924)  Next (10925)  Level (0)  CycleId (0)

pg_filedump -i -R 10925 ~/tmp/100252789 | grep Blocks Blocks: Previous (12710)  Next (10926)  Level (0)  CycleId (0)

pg_filedump -i -R 12709 ~/tmp/100252789 | grep Blocks Blocks: Previous (8849)  Next (8850)  Level (0)  CycleId (0)

pg_filedump -i -R 8840 ~/tmp/100252789 | grep Blocks Blocks: Previous (8555)  Next (9125)  Level (1)  CycleId (0)

pg_filedump -i -R 12439 ~/tmp/100252789 | grep Blocks Blocks: Previous (11405)  Next (11690)  Level (1)  CycleId (0)

$ pg_filedump -i -R 12177 ~/tmp/100252789 | grep Blocks Blocks: Previous (11690)  Next (0)  Level (1)  CycleId (0)

$ pg_filedump -i -R 10835 ~/tmp/100252789 | grep Blocks Blocks: Previous (10550)  Next (11120)  Level (1)  CycleId (0)

This symptom happened three times in the last couple of weeks now. Looking 
at the numbers doesn't give me the impression that some flaky hardware 
could be involved. What else can we do to track down this problem, any 
suggestions?

-- 
Thanks
Bernd



Re: Hanging backends and possible index corruption

From
Andres Freund
Date:
On 2013-01-25 16:24:52 +0100, Bernd Helmle wrote:
> We are currently analyzing an issue at one of our customers PostgreSQL
> database.
> 
> The current version is 9.1.6 (update to 9.1.7 is scheduled for next monday,
> no downtime possible before). It runs on POWER7 (pSeries 740) on an RHEL6.3
> 64-bit LPAR. The packages are built from PGDG SVN sources, no special tweaks
> added. We saw no hardware related errors on this machine, nor any crashes.
> 
> What currently happens on this machine are hanging statements (SELECTs and
> INSERTs occasionally) with 100% CPU. After some investigation it turned out
> that the corresponding backends are seeking within an index file over and
> over again in a loop. Looking into the hanging queries i've recognized
> certain keys which seems to have the problem, other keys used in the WHERE
> condition run smoothly. Turning off index and bitmap index scans caused the
> suspicious keys to return results, too.
> 
> So i've saved the index file (normal BTree index with a single bigint
> column), did a REINDEX and the problem was gone. Looking at the index file
> with pg_filedump and pgbtreecheck from Alvaro gave me the following output:
> 
> pgbtreecheck gives warnings about pages' parents and then loops visiting the
> same pages over and over again:
> 
> NOTICE: fast root: block 290 at level 2
> NOTICE: setting page parents
> WARNING: block 12709 already had a parent (8840); new parent is 12177
> WARNING: block 12710 already had a parent (12439); new parent is 10835
> NOTICE: done setting parents
> NOTICE: Checking forward scan of level 0, starting at block 1
> 
> -- loop starts
> 
> WARNING: right sibling 12710 does not point left to me (11680); points to
> 10924 instead
> 
> Looking into the relevant pages and their prev and next pointers give me the
> following:
> 
> pg_filedump -i -R 11680 ~/tmp/100252789 | grep Blocks
>  Blocks: Previous (12651)  Next (12710)  Level (0)  CycleId (0)
> 
> pg_filedump -i -R 10924 ~/tmp/100252789 | grep Blocks
>  Blocks: Previous (10923)  Next (12710)  Level (0)  CycleId (0)
> 
> pg_filedump -i -R 12710 ~/tmp/100252789 | grep Blocks
>  Blocks: Previous (10924)  Next (10925)  Level (0)  CycleId (0)
> 
> pg_filedump -i -R 10925 ~/tmp/100252789 | grep Blocks
>  Blocks: Previous (12710)  Next (10926)  Level (0)  CycleId (0)
> 
> pg_filedump -i -R 12709 ~/tmp/100252789 | grep Blocks
>  Blocks: Previous (8849)  Next (8850)  Level (0)  CycleId (0)
> 
> pg_filedump -i -R 8840 ~/tmp/100252789 | grep Blocks
>  Blocks: Previous (8555)  Next (9125)  Level (1)  CycleId (0)
> 
> pg_filedump -i -R 12439 ~/tmp/100252789 | grep Blocks
>  Blocks: Previous (11405)  Next (11690)  Level (1)  CycleId (0)
> 
> $ pg_filedump -i -R 12177 ~/tmp/100252789 | grep Blocks
>  Blocks: Previous (11690)  Next (0)  Level (1)  CycleId (0)
> 
> $ pg_filedump -i -R 10835 ~/tmp/100252789 | grep Blocks
>  Blocks: Previous (10550)  Next (11120)  Level (1)  CycleId (0)
> 
> This symptom happened three times in the last couple of weeks now. Looking
> at the numbers doesn't give me the impression that some flaky hardware could
> be involved. What else can we do to track down this problem, any
> suggestions?

Did you reindex after upgrading to 9.1.6? Did you ever have any crashes
or failovers before upgrading to 9.1.6?
I have seen pretty similar symptoms caused by "Fix persistence marking
of shared buffers during WAL replay" in 9.1.6.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Hanging backends and possible index corruption

From
Bernd Helmle
Date:

--On 25. Januar 2013 16:28:16 +0100 Andres Freund <andres@2ndquadrant.com> 
wrote:

> Did you reindex after upgrading to 9.1.6? Did you ever have any crashes
> or failovers before upgrading to 9.1.6?
> I have seen pretty similar symptoms caused by "Fix persistence marking
> of shared buffers during WAL replay" in 9.1.6.

Hmm it happened only on a single heavily used table for now and this table 
was REINDEXed twice after updating to 9.1.6 (every time the issue occured).

-- 
Thanks
Bernd



Re: Hanging backends and possible index corruption

From
Tom Lane
Date:
Bernd Helmle <mailings@oopsware.de> writes:
> So i've saved the index file (normal BTree index with a single bigint 
> column), did a REINDEX and the problem was gone. Looking at the index file 
> with pg_filedump and pgbtreecheck from Alvaro gave me the following output:
> ...

Don't know how careful pgbtreecheck is.  The pg_filedump output isn't
very helpful because you filtered away the flags, so we can't tell if
any of these pages are deleted.  (If they are, the duplicate-looking
links might not be errors, since we intentionally don't reset a deleted
page's left/right links when deleting it.)

Could we see the whole special-space dump for each of the pages you're
worried about?

One thought that occurs to me is that POWER is a weak-memory-ordering
architecture, so that it's a tenable idea that this has something to do
with changing page links while not holding sufficient lock on the page.
I don't see btree doing that anywhere, but ...

BTW, how long has this installation been around, and when did you start
seeing funny behavior?  Can you say with reasonable confidence that the
bug was *not* present in any older PG versions?
        regards, tom lane



Re: Hanging backends and possible index corruption

From
Bernd Helmle
Date:

--On 25. Januar 2013 20:37:32 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote:

>
> Don't know how careful pgbtreecheck is.  The pg_filedump output isn't
> very helpful because you filtered away the flags, so we can't tell if
> any of these pages are deleted.  (If they are, the duplicate-looking
> links might not be errors, since we intentionally don't reset a deleted
> page's left/right links when deleting it.)
>

Ah, wasn't aware of this.

> Could we see the whole special-space dump for each of the pages you're
> worried about?
>

Attached

> One thought that occurs to me is that POWER is a weak-memory-ordering
> architecture, so that it's a tenable idea that this has something to do
> with changing page links while not holding sufficient lock on the page.
> I don't see btree doing that anywhere, but ...
>
> BTW, how long has this installation been around, and when did you start
> seeing funny behavior?  Can you say with reasonable confidence that the
> bug was *not* present in any older PG versions?
>

This machine started in production around august last year with (afair)
9.1.5. There also were performance and stress tests on this machine before
it went into production, with no noticable problems.

However, what i missed before is that there were some trouble with the
storage multipaths. Seems early january the machine lost some of it's paths
to the SAN, but they were recovered a few seconds later...so i cannot
exclude this as the cause anymore, though the paths are redundant. What
strikes me is that the index was recreated in the meantime after this
issue...

We will watch this machine the next couple of weeks more closely, if the
issue comes back again.

--
Thanks

    Bernd
Attachment