Re: foreign key locks, 2nd attempt - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: foreign key locks, 2nd attempt |
Date | |
Msg-id | CA+TgmoakeJzSyhJQSHQrEoCciwFd4iw5-uCGpwZUe3jncN=Y=Q@mail.gmail.com Whole thread Raw |
In response to | Re: foreign key locks, 2nd attempt (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: foreign key locks, 2nd attempt
Re: foreign key locks, 2nd attempt |
List | pgsql-hackers |
On Sun, Feb 26, 2012 at 9:47 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> Regarding performance, the good thing about this patch is that if you >> have an operation that used to block, it might now not block. So maybe >> multixact-related operation is a bit slower than before, but if it >> allows you to continue operating rather than sit waiting until some >> other transaction releases you, it's much better. > > That's probably true, although there is some deferred cost that is > hard to account for. You might not block immediately, but then later > somebody might block either because the mxact SLRU now needs fsyncs or > because they've got to decode an mxid long after the relevant segment > has been evicted from the SLRU buffers. In general, it's hard to > bound that latter cost, because you only avoid blocking once (when the > initial update happens) but you might pay the extra cost of decoding > the mxid as many times as the row is read, which could be arbitrarily > many. How much of a problem that is in practice, I'm not completely > sure, but it has worried me before and it still does. In the worst > case scenario, a handful of frequently-accessed rows with MXIDs all of > whose members are dead except for the UPDATE they contain could result > in continual SLRU cache-thrashing. > > From a performance standpoint, we really need to think not only about > the cases where the patch wins, but also, and maybe more importantly, > the cases where it loses. There are some cases where the current > mechanism, use SHARE locks for foreign keys, is adequate. In > particular, it's adequate whenever the parent table is not updated at > all, or only very lightly. I believe that those people will pay > somewhat more with this patch, and especially in any case where > backends end up waiting for fsyncs in order to create new mxids, but > also just because I think this patch will have the effect of > increasing the space consumed by each individual mxid, which imposes a > distributed cost of its own. I spent some time thinking about this over the weekend, and I have an observation, and an idea. Here's the observation: I believe that locking a tuple whose xmin is uncommitted is always a noop, because if it's ever possible for a transaction to wait for an XID that is part of its own transaction (exact same XID, or sub-XIDs of the same top XID), then a transaction could deadlock against itself. I believe that this is not possible: if a transaction were to wait for an XID assigned to that same backend, then the lock manager would observe that an ExclusiveLock on the xid is already held, so the request for a ShareLock would be granted immediately. I also don't believe there's any situation in which the existence of an uncommitted tuple fails to block another backend, but a lock on that same uncommitted tuple would have caused another backend to block. If any of that sounds wrong, you can stop reading here (but please tell me why it sounds wrong). If it's right, then here's the idea: what if we stored mxids using xmin rather than xmax? This would mean that, instead of making mxids contain the tuple's original xmax, they'd need to instead contain the tuple's original xmin. This might seem like rearranging the deck chairs on the titanic, but I think it actually works out substantially better, because if we can assume that the xmin is committed, then we only need to know its exact value until it becomes older than RecentGlobalXmin. This means that a tuple can be both updated and locked at the same time without the MultiXact SLRU needing to be crash-safe, because if we crash and restart, any mxids that are still around from before the crash are known to contain only xmins that are now all-visible. We therefore don't need their exact values, so it doesn't matter if that data actually made it to disk. Furthermore, in the case where a previously-locked tuple is read repeatedly, we only need to keep doing SLRU lookups until the xmin becomes all-visible; after that, we can set a hint bit indicating that the tuple's xmin is all-visible, and any future readers (or writers) can use that to skip the SLRU lookup. In the degenerate (and probably common) case where a tuple is already all-visible at the time it's locked, we don't really need to record the original xmin at all; we can still do so if convenient, but we can set the xmin-all-visible hint right away, so nobody needs to probe the SLRU just to get xmin. In other words, we'd entirely avoid needing to make mxacts crash-safe, and we'd avoid most of the extra SLRU lookups that the current implementation requires; they'd only be needed when (and for as long as) the locked tuple was not yet all-visible. This also seems like it would make the anti-wraparound issues simpler to handle - once an mxid is old enough that any xmin it contains must be all-visible, we can simply overwrite the tuple's xmin with FrozenXID, which is pretty much what we're already doing anyway. It's already the case that a table has to have an anti-wraparound vacuum at least once after any given XID falls behind RecentGlobalXmin and before it can be reused, so we wouldn't need to do anti-wraparound vacuums any more frequently than currently. There's still the problem that we might exhaust the mxid space more quickly than the XID space, but that exists in either implementation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: