Re: foreign key locks, 2nd attempt - Mailing list pgsql-hackers

From Robert Haas
Subject Re: foreign key locks, 2nd attempt
Date
Msg-id CA+TgmoakeJzSyhJQSHQrEoCciwFd4iw5-uCGpwZUe3jncN=Y=Q@mail.gmail.com
Whole thread Raw
In response to Re: foreign key locks, 2nd attempt  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: foreign key locks, 2nd attempt
Re: foreign key locks, 2nd attempt
List pgsql-hackers
On Sun, Feb 26, 2012 at 9:47 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Regarding performance, the good thing about this patch is that if you
>> have an operation that used to block, it might now not block.  So maybe
>> multixact-related operation is a bit slower than before, but if it
>> allows you to continue operating rather than sit waiting until some
>> other transaction releases you, it's much better.
>
> That's probably true, although there is some deferred cost that is
> hard to account for.  You might not block immediately, but then later
> somebody might block either because the mxact SLRU now needs fsyncs or
> because they've got to decode an mxid long after the relevant segment
> has been evicted from the SLRU buffers.  In general, it's hard to
> bound that latter cost, because you only avoid blocking once (when the
> initial update happens) but you might pay the extra cost of decoding
> the mxid as many times as the row is read, which could be arbitrarily
> many.  How much of a problem that is in practice, I'm not completely
> sure, but it has worried me before and it still does.  In the worst
> case scenario, a handful of frequently-accessed rows with MXIDs all of
> whose members are dead except for the UPDATE they contain could result
> in continual SLRU cache-thrashing.
>
> From a performance standpoint, we really need to think not only about
> the cases where the patch wins, but also, and maybe more importantly,
> the cases where it loses.  There are some cases where the current
> mechanism, use SHARE locks for foreign keys, is adequate.  In
> particular, it's adequate whenever the parent table is not updated at
> all, or only very lightly.  I believe that those people will pay
> somewhat more with this patch, and especially in any case where
> backends end up waiting for fsyncs in order to create new mxids, but
> also just because I think this patch will have the effect of
> increasing the space consumed by each individual mxid, which imposes a
> distributed cost of its own.

I spent some time thinking about this over the weekend, and I have an
observation, and an idea.  Here's the observation: I believe that
locking a tuple whose xmin is uncommitted is always a noop, because if
it's ever possible for a transaction to wait for an XID that is part
of its own transaction (exact same XID, or sub-XIDs of the same top
XID), then a transaction could deadlock against itself.  I believe
that this is not possible: if a transaction were to wait for an XID
assigned to that same backend, then the lock manager would observe
that an ExclusiveLock on the xid is already held, so the request for a
ShareLock would be granted immediately.  I also don't believe there's
any situation in which the existence of an uncommitted tuple fails to
block another backend, but a lock on that same uncommitted tuple would
have caused another backend to block.  If any of that sounds wrong,
you can stop reading here (but please tell me why it sounds wrong).

If it's right, then here's the idea: what if we stored mxids using
xmin rather than xmax?  This would mean that, instead of making mxids
contain the tuple's original xmax, they'd need to instead contain the
tuple's original xmin.  This might seem like rearranging the deck
chairs on the titanic, but I think it actually works out substantially
better, because if we can assume that the xmin is committed, then we
only need to know its exact value until it becomes older than
RecentGlobalXmin.  This means that a tuple can be both updated and
locked at the same time without the MultiXact SLRU needing to be
crash-safe, because if we crash and restart, any mxids that are still
around from before the crash are known to contain only xmins that are
now all-visible.  We therefore don't need their exact values, so it
doesn't matter if that data actually made it to disk.  Furthermore, in
the case where a previously-locked tuple is read repeatedly, we only
need to keep doing SLRU lookups until the xmin becomes all-visible;
after that, we can set a hint bit indicating that the tuple's xmin is
all-visible, and any future readers (or writers) can use that to skip
the SLRU lookup.  In the degenerate (and probably common) case where a
tuple is already all-visible at the time it's locked, we don't really
need to record the original xmin at all; we can still do so if
convenient, but we can set the xmin-all-visible hint right away, so
nobody needs to probe the SLRU just to get xmin.

In other words, we'd entirely avoid needing to make mxacts crash-safe,
and we'd avoid most of the extra SLRU lookups that the current
implementation requires; they'd only be needed when (and for as long
as) the locked tuple was not yet all-visible.

This also seems like it would make the anti-wraparound issues simpler
to handle - once an mxid is old enough that any xmin it contains must
be all-visible, we can simply overwrite the tuple's xmin with
FrozenXID, which is pretty much what we're already doing anyway.  It's
already the case that a table has to have an anti-wraparound vacuum at
least once after any given XID falls behind RecentGlobalXmin and
before it can be reused, so we wouldn't need to do anti-wraparound
vacuums any more frequently than currently.  There's still the problem
that we might exhaust the mxid space more quickly than the XID space,
but that exists in either implementation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: pg_filedump improvements
Next
From: Jaime Casanova
Date:
Subject: Re: Partitioning triggers doc patch