Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Date | |
Msg-id | CA+TgmobzvesvFWy+Et0K3Y8TwthLan-dr9n3MBStHrZKd+TE1g@mail.gmail.com Whole thread Raw |
In response to | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
List | pgsql-hackers |
On Fri, Sep 13, 2013 at 2:59 PM, Peter Geoghegan <pg@heroku.com> wrote: > I would suggest letting those other individuals speak for themselves > too. Particularly if you're going to name someone who is on vacation > like that. You seem to be under the impression that I'm mentioning Tom's name, or Andres's, because I need to win some kind of an argument. I don't. We're not going to accept a patch that uses lwlocks in the way that you are proposing. > I mean, if we do the promise tuple > thing, and there are multiple unique indexes, what happens when an > inserter needs to block pending the outcome of another transaction? > They had better go clean up the promise tuples from the other unique > indexes that they're trying to insert into, because they cannot afford > to hold value locks for a long time, no matter how they're > implemented. As Andres already pointed out, this is not correct. Just to add to what he said, we already have long-lasting value locks in the form of SIREAD locks. SIREAD can exist at different levels of granularity, but one of those levels is index-page-level granularity, where they have the function of guarding against concurrent insertions of values that would fall within that page, which just so happens to be the same thing you want to do here. The difference between those locks and what you're proposing here is that they are implemented differently. That is why those were acceptable and this is not. > That could take much longer than just releasing a shared > buffer lock, since for each unique index the promise tuple must be > re-found from scratch. There are huge issues with additional > complexity and bloat. Oh, and now your lightweight locks aren't so > lightweight any more. Yep, totally agreed. If you simply lock the buffer, or take some other action which freezes out all concurrent modifications to the page, then re-finding the lock is much simpler. On the other hand, it's much simpler precisely because you've reduced concurrency to the degree necessary to make it simple. And reducing concurrency is bad. Similarly, complexity and bloat are not great things taken in isolation, but many of our existing locking schemes are already very complex. Tuple locks result in a complex jig that involves locking the tuple via the heavyweight lock manager, performing a WAL-logged modification to the page, and then releasing the lock in the heavyweight lock manager. As here, that is way more expensive than simply grabbing and holding a share-lock on the page. But we get a number of important benefits out of it. The backend remains interruptible while the tuple is locked, the protocol for granting locks is FIFO to prevent starvation, we don't suppress page eviction while the lock is held, we can simultaneously lock arbitrarily large numbers of tuples, and deadlocks are detect and handled cleanly. If those requirements were negotiable, we would surely have negotiated them away already, because the performance benefits would be immense. > If the value locks were made interruptible through some method, such > as the promise tuples approach, does that really make deadlocking > acceptable? Yes. It's not possible to prevent all deadlocks. It IS possible to make sure that they are properly detected and that precisely one of the transactions involved is rolled back to resolve the deadlock. > You can hardly compare a buffer's LWLock with a system one that > protects critical shared memory structures. We're talking about a > shared lock on a single btree leaf page per unique index involved in > upserting. Actually, I can and I am. Buffers ARE critical shared memory structures. >> A further problem is that a backend which holds even one lwlock can't >> be interrupted. We've had this argument before and it seems that you >> don't think that non-interruptibility is a problem, but it project >> policy to allow for timely interrupts in all parts of the backend and >> we're not going to change that policy for this patch. > > I don't think non-interruptibility is a problem? Really, do you think > that this kind of inflammatory rhetoric helps anybody? I said nothing > of the sort. I recall saying something about an engineering trade-off. > Of course I value interruptibility. I don't see what's inflammatory about that statement. The point is that this isn't the first time you've proposed a change which would harm interruptibility and it isn't the first time I've objected on precisely that basis. Interruptibility is not a nice-to-have that we can trade away from time to time; it's essential and non-negotiable. > If you're concerned about non-interruptibility, consider XLogFlush(). > That does rather a lot of work with WALWriteLock exclusive locked. On > a busy system, some backend is very frequently going to experience a > non-interruptible wait for the duration of however long it takes to > write and flush perhaps a whole segment. All other flushing backends > are stuck in non-interruptible waits waiting for that backend to > finish. I think that the group commit stuff might have regressed > worst-case interruptibility for flushers by quite a bit; should we > have never committed that, or do you agree with my view that it's > worth it? It wouldn't take a lot to convince me that it wasn't worth it, because I was never all that excited about that patch to begin with. I think it mostly helps in extremely artificial situations that are not likely to occur on real systems anyway. But, yeah, WALWriteLock is a problem, no doubt about it. We should try to make the number of such problems go down, not up, even if it means passing up new features that we'd really like to have. > In contrast, what I've proposed here is in general quite unlikely to > result in any I/O for the duration of the time the locks are held. > Only writers will be blocked. And only those inserting into a narrow > range of values around the btree leaf page. Much of the work that even > those writers need to do will be unimpeded anyway; they'll just block > on attempting to acquire an exclusive lock on the first btree leaf > page that the value they're inserting could be on. Sure, but you're talking about broadening the problem from the guy performing the insert to everybody who might be trying to an insert that hits one of the same unique-index pages. Instead of holding one buffer lock, the guy performing the insert is now holding as many buffer locks as there are indexes. That's a non-trivial issue. For that matter, if the table has more than MAX_SIMUL_LWLOCKS indexes, you'll error out. In fact, if you get the number of indexes exactly right, you'll exceed MAX_SIMUL_LWLOCKS in visibilitymap_clear() and panic the whole system. Oh, and if different backends load the index list in different orders, because say the system catalog gets vacuumed between their respective relcache loads, then they may try to lock the indexes in different orders and cause an undetected deadlock. And, drifting a bit further off-topic, even to get as far as you have, you've added overhead to every lwlock acquisition and release, even for people who never use this functionality. I'm pretty skeptical about anything that involves adding additional frammishes to the lwlock mechanism. There are a few new primitives I'd like, too, but every one we add slows things down for everybody. > And the additional > non-interruptible wait of those inserters won't be terribly much more > than the wait of the backend where heap tuple insertion takes a long > time anyway - that guy already has to do close to 100% of that work > with a non-interruptible wait today (once we eliminate > heap_prepare_insert() and toasting). The UnlockReleaseBuffer() call is > right at the end of heap_insert, and the buffer is pinned and locked > very close to the start. That's true but somewhat misleading. Textually most of the function holds the buffer lock, but heap_prepare_insert(), CheckForSerializableConflictIn(), and RelationGetBufferForTuple(), and XLogWrite() are the parts that do substantial amounts of computation, and only the last of those happens while holding the buffer lock. And that last is really fundamental, because we can't let any other backend see the modified buffer until we've xlog'd the changes. The problems you're proposing to create do not fall into the same category. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: