Re: Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem? - Mailing list pgsql-hackers

Hi,

On 2014-02-24 17:55:14 -0300, Alvaro Herrera wrote:
> Greg Stark wrote:
> > I have a database where a a couple rows don't appear in index scans
> > but do appear in sequential scans. It looks like the same problem as
> > Peter reported but this is a different database. I've extracted all
> > the xlogdump records and below are the ones I think are relevant. You
> > can see that lp 2 gets a few HOT updates and concurrently has someone
> > create a MultiXact NO KEY UPDATE lock while one of those HOT updates
> > is pending but not committed.

Per se the sequence of records doesn't look bad (even though I am not
happy that we log intermediate and final rows first, and only then the
start of the chain).

> > The net result seems to be that the ctid
> > update chain got broken. The index of course points to the head of the
> > HOT chain so it doesn't find the live tail whereas the sequential scan
> > picks it up.

Yea, that's the problem.

> > rmgr: Heap        len (rec/tot):    291/   323, tx:    5943849, lsn: FD/2F0ADFC0, prev FD/2F0ADF90, bkp: 0000,
desc:hot_update: rel 1663/16385/212653; tid 13065/2 xmax 5943849 ; new tid 13065/3 xmax 0
 
> > rmgr: Heap2       len (rec/tot):     25/    57, tx:    5943851, lsn: FD/2F0AE450, prev FD/2F0AE408, bkp: 0000,
desc:lock updated: xmax 5943851 msk 000a; rel 1663/16385/212653; tid 13065/3
 
> > rmgr: MultiXact   len (rec/tot):     28/    60, tx:    5943851, lsn: FD/2F0AE490, prev FD/2F0AE450, bkp: 0000,
desc:create mxid 728896 offset 1632045 nmembers 2: 5943849 (nokeyupd) 5943851 (keysh)
 
> > rmgr: Heap        len (rec/tot):     25/    57, tx:    5943851, lsn: FD/2F0AE4D0, prev FD/2F0AE490, bkp: 0000,
desc:lock 728896: rel 1663/16385/212653; tid 13065/2 IS_MULTI EXCL_LOCK
 

> >  lp | lp_off | lp_flags | lp_len | t_xmin  | t_xmax  | t_field3 |   t_ctid   | t_infomask2 | t_infomask | t_hoff |
> >
----+--------+----------+--------+---------+---------+----------+------------+-------------+------------+--------+-
> >   2 |   3424 |        1 |    232 | 5943845 |  728896 |        0 | (13065,2)  |          32 |       4419 |     32 |
> >   3 |   3152 |        1 |    272 | 5943849 | 5943879 |        0 | (13065,4)  |       49184 |       9475 |     32 |
> >   4 |   2864 |        1 |    287 | 5943879 | 5943880 |        0 | (13065,7)  |       49184 |       9475 |     32 |
> >   7 |   2576 |        1 |    287 | 5943880 |       0 |        0 | (13065,7)  |       32800 |      10499 |     32 |

Those together explain the story. Note this bit:

static void
heap_xlog_lock(XLogRecPtr lsn, XLogRecord *record)
{
...   HeapTupleHeaderClearHotUpdated(htup);   HeapTupleHeaderSetXmax(htup, xlrec->locking_xid);
HeapTupleHeaderSetCmax(htup,FirstCommandId, false);   /* Make sure there is no forward chain link in t_ctid */
htup->t_ctid= xlrec->target.tid;
 
...
}

So, the replay of FD/2F0AE4D0 breaks the ctid chain *and* unsets the
HOT_UPDATED flag.

Which means fkey locks have never properly worked across SR/crash
recovery.

Haven't thought about how to fix it yet, I hope won't have to (hint hint).

We somehow need to have a policy of testing changes to the WAL format
without full_page_writes. They hide bugs in replay far, far too often.


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: jsonb and nested hstore
Next
From: Andres Freund
Date:
Subject: Re: Another possible corruption bug in 9.3.2 or possibly a known MultiXact problem?