Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Date
Msg-id CA+TgmoZ0PzoGMRBU-NOEx8YLEf5LeBF8TXiht5r0zeEVNpeT7g@mail.gmail.com
Whole thread Raw
In response to Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)  (Andres Freund <andres@anarazel.de>)
Responses Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, May 10, 2016 at 3:05 AM, Andres Freund <andres@anarazel.de> wrote:
> The easy way to trigger this problem would be to have an oid wraparound
> - but the WAL shows that that's not the case here.  I've not figured
> that one out entirely (and won't tonight). But I do see WAL records
> like:
> rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID
4302693
> rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID
4302693
> i.e. two NEXTOID records allocating the same range, which obviously
> doesn't seem right.  There's also every now and then close by ranges:
> rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 1/9A404DB8, prev 1/9A404270, desc: NEXTOID
3311455
> rmgr: XLOG        len (rec/tot):      4/    30, tx:    7814505, lsn: 1/9A4EC888, prev 1/9A4EB9D0, desc: NEXTOID
3311461
>
>
> As far as I can see something like the above, or an oid wraparound, are
> pretty much deadly for toast.
>
> Is anybody ready with a good defense for SatisfiesToast not doing any
> actual liveliness checks?

I assume that this was installed as a performance optimization, and I
don't really see why it shouldn't be or be able to be made safe.  I
assume that the wraparound case was deemed safe because at that time
the idea of 4 billion OIDs getting used with old transactions still
active seemed inconceivable.  It seems to me that the real question
here is how you're getting two calls to XLogPutNextOid() with the same
value of ShmemVariableCache->nextOid, and the answer, as it seems to
me, must be that LWLocks are broken.  Either two processes are
managing to hold OidGenLock in exclusive mode at the same time, or
they're acquiring it in quick succession but without the second
process seeing all of the updates performed by the first process.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Rajeev rastogi
Date:
Subject: Re: asynchronous and vectorized execution
Next
From: Amit Kapila
Date:
Subject: Hash Indexes