Thread: Re: [COMMITTERS] pgsql: Implement sharable row-level locks, and use them for foreign key

On Thu, Apr 28, 2005 at 06:47:18PM -0300, Tom Lane wrote:

> Implement sharable row-level locks, and use them for foreign key references
> to eliminate unnecessary deadlocks.  This commit adds SELECT ... FOR SHARE
> paralleling SELECT ... FOR UPDATE.  The implementation uses a new SLRU
> data structure (managed much like pg_subtrans) to represent multiple-
> transaction-ID sets.

One point I didn't quite understand was the business about XLogging
heap_lock_tuple.  I had to reread your mail to -hackers on this issue
several times to get it (as you can see I don't fully grok the WAL
rules).  Now, I believe that heap_mark4update was wrong on this, no?
Only it didn't matter because after a crash nobody cared about the
stored Xmax.

One nice side effect of this is that the 2PC patch now has this problem
solved.  The bad part is that locking a tuple emits an (non-XLogFlushed)
WAL record and it may have a performance impact.  (We should have better
performance overall I think, because transactions are no longer locked
on foreign key checking.)


Anyway: many thanks for updating the patch to an usable state.  I'm
sorry to have inflicted all those bugs upon you.

-- 
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)
"La soledad es compañía"


Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> One point I didn't quite understand was the business about XLogging
> heap_lock_tuple.  I had to reread your mail to -hackers on this issue
> several times to get it (as you can see I don't fully grok the WAL
> rules).  Now, I believe that heap_mark4update was wrong on this, no?
> Only it didn't matter because after a crash nobody cared about the
> stored Xmax.

Well, actually the reason I decided to put in xlogging there was that
I realized it was already broken before.  In the existing code it was
possible to have this scenario:* transaction N selects-for-update some tuple, so N goes into  the tuple's XMAX.*
transactionN ends without doing anything else.  Since it's  not produced any XLOG entries, xact.c thinks it doesn't
need to emit either a COMMIT or ABORT xlog record.* therefore, there is no record whatsoever of XID N in XLOG.*
bgwriterpushes the dirty data page to disk.* database crashes.* on restart, WAL replay sets the XID counter to N or
less, because there is no evidence in the XLOG for N.* now there will be a "new" transaction N that is mistakenly
consideredto own an update lock on the tuple.
 

While the negative impact of this situation is probably not high,
it's clearly The Wrong Thing.

The MultiXactId patch introduces a second way to have the same
problem, ie a MultiXactId on disk for which there is no evidence
in XLOG, so the MXID might get re-used after restart.

In view of the fact that we want to do 2PC sometime soon, and that
absolutely requires xlogging every lock, I thought that continuing to
try to avoid emitting an xlog record for heap_lock_tuple was just silly.
        regards, tom lane