Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Date | |
Msg-id | CAM3SWZRNPgZeTwHgzwGHXmC1aVi9=9A3Fi-WnQh=pS9uPjQJNQ@mail.gmail.com Whole thread Raw |
In response to | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
List | pgsql-hackers |
On Fri, Sep 13, 2013 at 12:23 PM, Andres Freund <andres@2ndquadrant.com> wrote: > The reason I wasn't saying "this will never get accepted" are twofold: > a) I don't want to stiffle alternative ideas to the "promises" idea, > just because I think it's the way to go. That might stop a better idea > from being articulated. b) I am not actually in the position to say it's > not going to be accepted. Well, the reality is that the promises idea hasn't been described in remotely enough detail to compare it to what I have here. I've pointed out plenty of problems with it. After all, it was the first thing that I considered, and I'm on the record talking about it in the 2012 dev meeting. I didn't take that approach for many good reasons. The reason I ended up here is not because I didn't get the memo about holding buffer locks across complex operations being a bad thing. At least grant me that. I'm here because in all these years no one has come up with a suggestion that doesn't have some very major downsides. Like, even worse than this. >> As to the rules you refer to, you must mean "These locks are intended >> to be short-term: they should not be held for long". I don't think >> that they will ever be held for long. At least, when I've managed the >> amount of work that a heap_insert() can do better. I expect to produce >> a revision where toasting doesn't happen with the locks held soon. >> Actually, I've already written the code, I just need to do some >> testing. > > I personally think - and have stated so before - that doing a > heap_insert() while holding the btree lock is unacceptable. Presumably your reason is essentially that we exclusive lock a heap buffer (exactly one heap buffer) while holding shared locks on btree index buffers. Is that really so different to holding an exclusive lock on a btree buffer while holding a shared lock on a heap buffer? Because that's what _bt_check_unique() does today. Now, I'll grant you that there is one appreciable difference, which is that multiple unique indexes may be involved. But limiting ourselves to the primary key or something like that remains an option. And I'm not sure that it's really any worse anyway. >> The btree code is different, though: It implements a well-defined >> interface, with much clearer separation of concerns. > > Which you're completely violating by linking the btree buffer locking > with the heap locking. It's not about the btree code alone. You're right that it isn't about just the btree code. In order for a deadlock to occur, there must be a mutual dependency. What code could feel entitled to hold buffer locks on btree buffers and heap buffers at the same time except the btree code itself? It already does so. But no one else does the same thing. If anyone did anything with a heap buffer lock held that could result in a call into one of the btree access method functions (I'm not contemplating the possibility of this other code locking the btree buffer *directly*), I'm quite sure that that would be rejected outright today, because that causes deadlocks. Certainly, vacuumlazy.c doesn't do it, for example. Why would anyone ever want to do that anyway? I cannot think of any reason. I suppose that that does still leave "transitive dependencies", but now you're stretching things. After all, you're not supposed to hold buffer locks for long! The dependency would have to transit through, say, one of the system LWLocks used for WAL Logging. Seems pretty close to impossible that it'd be an issue - index stuff is only WAL-logged as index tuples are inserted (that is, as the locks are finally released). Everyone automatically does that kind of thing in a consistent order of locking, unlocking in the opposite order anyway. But what of the btree code deadlocking with itself? There are only a few functions (2 or 3) where that's possible even in principle. I think that they're going to be not too hard to analyze. For example, with insertion, the trick is to always lock in a consistent order and unlock/insert in the opposite order. The heap shared lock(s) needed in the btree code cannot deadlock with another upserter because once the other upserter has that exclusive heap buffer lock, it's *inevitable* that it will release all of its shared buffer locks. Granted, I could stand to think about this case more, but you get the idea - it *is* possible to clamp down on the code that needs to care about this stuff to a large degree. It's subtle, but btrees are generally considered pretty complicated, and the btree code already cares about some odd cases like these (it takes special precuations for catalog indexes, for example). The really weird thing about my patch is that the btree code trusts the executor to call the heapam code to do the right thing in the right way - it now knows more than I'd prefer. Would you be happier if the btree code took more direct responsibility for the heap tuple insertion instead? Before you say "that's ridiculous", consider the big modularity violation that has always existed. It may be no more ridiculous than that. And that existing state of affairs may be no less ridiculous than living with what I've already done. > At this point I am a bit confused why you are asking for review. I am asking for us, collectively, through consensus, to resolve the basic approach to doing this. That was something I stated right up front, pointing out details of where the discussion had gone in the past. That was my explicit goal. There has been plenty of discussing on this down through the years, but nothing ever came from it. Why is this an intractable problem for over a decade for us alone? Why isn't this a problem for other database systems? I'm not implying that it's because they do this. It's something that I am earnestly interested in, though. A number of people have asked me that, and I don't have a good answer for them. >> I mean, if we do the promise tuple thing, and there are multiple >> unique indexes, what happens when an inserter needs to block pending >> the outcome of another transaction? They had better go clean up the >> promise tuples from the other unique indexes that they're trying to >> insert into, because they cannot afford to hold value locks for a long >> time, no matter how they're implemented. > > Why? We're using normal transaction visibility rules here. We don't stop > *other* values on the same index getting updated or similar. Because you're locking a value in some other, earlier unique index, all the while waiting *indefinitely* on some other value in a second or subsequent one. That isn't acceptable. A bunch of backends would back up just because one backend had this contention on the second unique index value that the others didn't actually have themselves. My design allows those other backends to immediately go through and finish. Value locks have these kinds of hazards no matter how you implement them. Deadlocks, and unreasonable stalling as described here is always unacceptable - whether or not the problems are detected at runtime is ultimately of marginal interest. Either way, it's a bug. I think that the details of how this approach compare to others are totally pertinent. For me, that's the whole point - getting towards something that will balance all of these concerns and be acceptable. Yes, it's entirely possible that that could look quite different to what I have here. I do not want to reduce all this to a question of "is this one design acceptable or not?". Am I not allowed to propose a design to drive discussion? That's how the most important features get implemented around here. -- Peter Geoghegan
pgsql-hackers by date: