Home > mailing lists

Re: Index corruption - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Index corruption
Date	June 29, 2006 23:27:18
Msg-id	18191.1151623627@sss.pgh.pa.us Whole thread Raw
In response to	Re: Index corruption (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

I wrote:
> What I speculate right at the moment is that we are not looking at index
> corruption at all, but at heap corruption: somehow, the first insertion
> into ctid (27806,2) got lost and the same ctid got re-used for the next
> inserted row.  We fixed one bug like this before ...

Further study makes this look more probable.  It seems that most of the
activity around the trouble spot consists of transactions inserting
exactly 13 rows into the table; for instance here are traces of two
successive transactions:

Block    Item    XMIN        CMIN

27800    7    600921860    3
27800    8    600921860    7
27800    9    600921860    11
27800    10    600921860    15
27800    11    600921860    19
27800    12    600921860    23
27800    13    600921860    27
27800    14    600921860    31
27800    15    600921860    35
27800    16    600921860    39
27800    17    600921860    43
27800    18    600921860    47
27800    19    600921860    51

27800    20    600921870    3
27800    21    600921870    7
27800    22    600921870    11
27800    23    600921870    15
27800    24    600921870    19
27800    25    600921870    23
27800    26    600921870    27
27800    27    600921870    31
27800    28    600921870    35
27800    29    600921870    39
27800    30    600921870    43
27800    31    600921870    47
27800    32    600921870    51

The pattern of CMIN values is the same for all these transactions.
But look at the rows inserted by 600921856 and 600921858, the two
transactions that seem to be involved with the problem on page
27086:

27787    50    600921856    3
27795    41    600921858    3
27795    42    600921858    7
27795    43    600921858    11
27795    44    600921858    15
27795    45    600921858    19
27795    46    600921858    23
27795    47    600921858    27
27795    48    600921858    31
27795    49    600921858    35
27795    50    600921858    39
27795    51    600921858    43
27806    1    600921858    47
27806    2    600921856    15
27806    3    600921856    19
27806    4    600921856    23
27806    5    600921856    27
27806    6    600921856    31
27806    7    600921856    35
27806    8    600921856    39
27806    9    600921856    43
27806    10    600921856    47
27806    11    600921856    51
27806    12    600921858    51

(27787,50) and (27795,51) are both the last rows on their pages.  What's
evidently happened is that the two transactions filled those pages and
then both seized on 27806 as the next page to insert into.  I think that
600921856 tried to insert its CMIN 7 and 11 rows as items 1 and 2 on
that page, and then something wiped the page, then 600921858 inserted
its CMIN 47 row as item 1, and then 600921856 got control back and
finished inserting its rows.  Further study of the indexes shows that
there are two entries in each index pointing to each of (27806,1) and
(27806,2) --- but since the xxid values are different for the two
(27806,1) entries, those didn't show up as duplicates in my first look.

Given the apparent connection to vacuuming, this is looking a WHOLE
lot like this bug fixed in 8.0.3:

2005-05-07 17:32  tgl
* src/backend/: access/heap/hio.c, commands/vacuumlazy.c(REL7_3_STABLE), access/heap/hio.c,
access/nbtree/nbtpage.c,access/nbtree/nbtree.c,commands/vacuumlazy.c (REL7_4_STABLE),access/heap/hio.c,
commands/vacuumlazy.c(REL7_2_STABLE),access/heap/hio.c, access/nbtree/nbtpage.c,
access/nbtree/nbtree.c,commands/vacuumlazy.c(REL8_0_STABLE), access/heap/hio.c,access/nbtree/nbtpage.c,
access/nbtree/nbtree.c,commands/vacuumlazy.c:Repair very-low-probability race conditionbetween relation extension and
VACUUM:in the interval betweenadding a new page to the relation and formatting it, it waspossible for VACUUM to come
alongand decide it should format thepage too.  Though not harmful in itself, this would cause data lossif a third
transactionwere able to insert tuples into the vacuumedpage before the original extender got control back.

Are you *certain* this slave isn't running 8.0.2 or older?  If you can
verify that, then I guess we need to look for another mechanism that
could cause the same kind of thing.
        regards, tom lane

pgsql-hackers by date:

From: Marc Munro
Date: 29 June 2006, 22:06:28
Subject: Re: Index corruption

From: Marc Munro
Date: 29 June 2006, 23:53:06
Subject: [Re: Index corruption]

Re: Index corruption - Mailing list pgsql-hackers

Previous

Next