Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Date | |
Msg-id | CAM3SWZQ9XMM8bZyNX3memy1AMQcKqXuUSy8t1iFqZz999U_AGQ@mail.gmail.com Whole thread Raw |
In response to | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Responses |
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
List | pgsql-hackers |
On Tue, Nov 19, 2013 at 5:13 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > Ok. Which use case are you targeting during this initial effort, batch > updates or small OLTP transactions? OLTP transactions are probably my primary concern. I just realized that I wasn't actually very clear on that point in my most recent e-mail -- my apologies. What we really need for batching, and what we should work towards in the medium term is MERGE, where a single table scan does everything. However, I also care about facilitating conflict resolution in multi-master replication systems, so I think we definitely ought to consider that carefully if at all possible. Incidentally, Andres said a few weeks back that he thinks that what I've proposed ought to be only exposed to C code, owing to the fact that it necessitates the visibility trick (actually, I think UPSERT does generally, but what I've done has, I suppose, necessitated making it more explicit/general - i.e. modifications are added to HeapTupleSatisfiesMVCC()). I don't understand what difference it makes to only exposed it at the C level - what I've proposed in this area is either correct or incorrect (Andres mentioned the Halloween problem). Furthermore, I presume that it's broadly useful to have Bucardo-style custom conflict resolution policies, without people having to get their hands dirty with C, and I think having this at the SQL level helps there. Plus, as I've said many times, the flexibility this syntax offers is likely to be broadly useful for ordinary SQL clients - this is almost as good as SQL MERGE for many cases. >> Seems like an awful lot of additional mechanism. > > Not really. Once you have the code in place to do the kill-inserted-tuple > dance on a conflict, all you need is to do an extra index search before it. > And once you have that, it's not hard to add some kind of a heuristic to > either do the pre-check or skip it. Perhaps. > I probably shouldn't have mentioned markpos/restrpos, you're right that it's > not a good idea to conflate that with index insertion. Nevertheless, some > kind of an API for doing a duplicate-key check prior to insertion, and > remembering the location for the actual insert later, seems sensible. It's > certainly no more of a modularity violation than the value-locking scheme > you're proposing. I'm not so sure - in principle, any locking implementation can be used by any conceivable amcanunique indexing method. The core system knows that it isn't okay to sit on them all day long, but that doesn't seem very onerous. >> I'm certainly not opposed to making something like this work for >> exclusion constraints. Certainly, I want this to be as general as >> possible. But I don't think that it needs to be a blocker, and I don't >> think we gain anything in code footprint by addressing that by being >> as general as possible in our approach to the basic concurrency issue. >> After all, we're going to have to repeat the basic pattern in multiple >> modules. > > > Well, I don't know what to say. I *do* have a hunch that we'd gain much in > code footprint by making this general. I don't understand what pattern you'd > need to repeat in multiple modules. Now that I see this rough patch, I better appreciate what you mean. I withdraw this objection. > Here's a patch, implementing a rough version of the scheme I'm trying to > explain. It's not as polished as yours, but it ought to be enough to > evaluate the code footprint and performance. It doesn't make any changes to > the indexam API, and it works the same with exclusion constraints and unique > constraints. As it stands, it doesn't leave bloat behind, except when a > concurrent insertion with a conflicting key happens between the first > "pre-check" and the actual insertion. That should be rare in practice. > > What have you been using to performance test this? I was just testing my patch against a custom pgbench workload, involving running upserts against a table from a fixed range of PK values. It's proven kind of difficult to benchmark this in the way that pgbench has proved useful for in the past. Pretty soon the table's PK range is "saturated", so they're all updates, but on the other hand how do you balance the INSERT or UPDATE case? Multiple unique indexes are the interesting case for comparing both approaches. I didn't really worry about performance so much as correctness, and for multiple unique constraints your approach clearly falls down, as explained below. >> Is it even useful to lock multiple rows if we can't really >> update them, because they'll overlap each other when all updated with >> the one value? > > > Hmm. I think what you're referring to is the case where you try to insert a > row so that it violates an exclusion constraint, and in a way that it > conflicts with a large number of existing tuples. For example, if you have a > calendar application with a constraint that two reservations must not > overlap, and you try to insert a new reservation that covers, say, a whole > decade. Right. > That's not a problem for ON DUPLICATE KEY IGNORE, as you just ignore the > conflict and move on. For ON DUPLICATE KEY LOCK FOR UPDATE, I guess we would > need to handle a large TID array. Or maybe we can arrange it so that the > tuples are locked as we scan them, without having to collect them all in a > large array. > > (the attached patch only locks the first existing tuple that conflicts; that > needs to be fixed) I'm having a hard time seeing how ON DUPLICATE KEY LOCK FOR UPDATE is of very much use to exclusion constraints at all. Perhaps I lack imagination here. However, ON DUPLICATE KEY IGNORE certainly *is* useful with exclusion constraints, and I'm not dismissive of that. I think we ought to at least be realistic about the concerns that inform your approach here. I don't think that making this work for exclusion constraints is all that compelling; I'll take it, I guess (not that there is obviously a dichotomy between doing btree locking and doing ECs too), but I doubt people put "overlaps" operators in the predicates of DML very often *at all*, and doubt even more that there is actual demand for upserting there. I think that the reason that you prefer this design is almost entirely down to possible hazards with btree locking around what I've done (or, indeed anything that approximates what I've done); maybe that's so obvious that you didn't even occur to you to mention it, but I think it should be acknowledged. I don't think that using index locking of *some* form is unreasonable. Certainly, I think that from reading the literature (e.g. [1]) one can find evidence that btree page index locking as part of value locking seems like a common technique in many popular RDBMSs, and presumably forms an important part of their SQL MERGE implementations. As it says in that paper: """ Thus, non-leaf pages do not require locks and are protected by latches only. The remainder of this paper focuses on locks. """ They talk here about a very traditional System-R architecture - "Assumptions about the database environment are designed to be very traditional". Latches here are basically equivalent to our buffer locks, and what they call locks we call heavyweight locks. So I'm pretty sure many other *traditional* systems handle value locking by escalating a "latch" to a leaf-page-level heavyweight lock (it's often more granular too). I think that the advantages are fairly fundamental. I think that "4.1 Locks on keys and ranges" of this paper is interesting. I've also found a gentler introduction to traditional btree key locking [2]. In that paper, section "5 Protecting a B-tree’s logical contents" it is said: """ Latches must be managed carefully in key range locking if lockable resources are defined by keys that may be deleted if not protected. Until the lock request is inserted into the lock manager’s data structures, the latch on the data structure in the buffer pool is required to ensure the existence of the key value. On the other hand, if a lock cannot be granted immediately, the thread should not hold a latch while the transaction waits. Thus, after waiting for a key value lock, a transaction must repeat its root-to-leaf search for the key. """ So I strongly suspect that some other systems have found it useful to escalate from a latch (buffer/page lock) to a lock (heavyweight lock). I have some concerns about what you've done that may limit my immediate ability to judge performance, and the relative merits of both approaches generally. Now, I know you just wanted to sketch something out, and that's fine, but I'm only sharing my thoughts. I am particularly worried about the worst case (for either approach), particularly with more than 1 unique index. I am also worried about livelock hazards (again, in particular with more than 1 index) - I am not asserting that they exist in your patch, but they are definitely more difficult to reason about. Value locking works because once a page lock is acquired, all unique indexes are inserted into. Could you have two upserters livelock each other with two unique indexes with 1:1 correlated values in practice (i.e. 2 unique indexes that might almost work as 1 composite index)? That is a reasonable usage of upsert, I think. We never wait on another transaction if there is a conflict when inserting - we just do the usual UNIQUE_CHECK_PARTIAL thing (we don't wait for other xact during btree insertion). This breaks the IGNORE case (how does it determine the final outcome of the transaction that inserted what may be a conflict, iff the conflict was only found during insertion?), which would probably be fine for our purposes if that were the only issue, but I have concerns about its effects on the ON DUPLICATE KEY LOCK FOR UPDATE case too. I don't like that an upserter's ExecInsertIndexTuples() won't wait on other xids generally, I think. Why should the code btree-insert even though it knows it's going to kill the heap tuple? It makes things very hard to reason about. If you are just mostly thinking about exclusion constraints here, then I'm not sure that even at this point that it's okay that the IGNORE case doesn't work there, because IGNORE is the only thing that makes much sense for exclusion constraints. The unacceptable-deadlocking-pattern generally occurs when we try to lock two different row versions. Your patch is fairly easy to make deadlock. Regarding this: /* * At this point we have either a conflict or a potential conflict. If * we're not supposed to raise error, just return the fact of the * potential conflict without waiting to see if it's real. */ if (errorOK && !wait) { conflict = true; if (conflictTid) *conflictTid = tup->t_self; break; } Don't we really just have only a potential conflict? Even if conflictTid is committed? I think it's odd that you insert btree index tuples without ever worrying about waiting (which is what breaks the IGNORE case, you might say). UNIQUE_CHECK_PARTIAL never gives an xid to wait on from within _bt_check_unique(). Won't that itself make other sessions block pending the outcome of our transaction (in non-upserting ExecInsertIndexTuples(), or in ExecCheckIndexConstraints())? Could that be why your patch deadlocks unreasonable (that is, in the way you've already agreed, in your most recent mail, isn't okay)? Isn't it only already okay that UNIQUE_CHECK_PARTIAL might do that for deferred unique indexes because of re-checking, which may then abort the xact? How will this work?: * XXX: If we know or assume that there are few duplicates, it would * be better to skip this, and just optimistically proceed with the * insertion below. You would then leave behind some garbage when a * conflict happens, but if it's rare, it doesn't matter much. Some * kind of heuristic might be in order here, like stop doing these * pre-checks if the last 100 insertions have not been duplicates. ...when you consider that the only place a tid can come from is this pre-check? Anyway, consider the following simple test-case of your patch. postgres=# create unlogged table foo ( a int4 primary key, b int4 unique ); CREATE TABLE If I run the attached pgbench script like this: pg@hamster:~/pgbench-tools/tests$ pgbench -f upsert.sql -n -c 50 -T 20 I can get it to deadlock (and especially to throw unique constraint violations) like crazy. Single unique indexes seemed okay, though I have my doubts that only allowing one unique index gets us far, or that it will be acceptable to have the user specify a unique index in DML or something. I discussed this with Robert in relation to his design upthread. Multiple unique constraints were *always* the hard case. I mean, my patch only really does something unconventional *because* of that case, really. One unique index is easy. Leaving discussion of value locking aside, just how rough is this revision of yours? What do you think of certain controversial aspects of my design that remain unchanged, such as the visibility trick (as actually implemented, and/or just in principle)? What about the syntax itself? It is certainly valuable to have additional MERGE-like functionality above and beyond the basic "upsert", not least for multi-master conflict resolution with complex resolution policies, and this syntax gets us much of that. How would you feel about making it possible for the UPDATE to use a tidscan, by projecting out the tid that caused a conflict, as a semi-documented optimization? It might be unfortunate if someone tried to UPDATE based on that ctid twice, but that is a less common requirement. It is kind of an abuse of notation, because of course you're not supposed to be projecting out the conflict-causer but the rejects, but perhaps we can live with that, if we can live with the basic idea. I'm sorry if my thoughts here are not fully developed, but it's hard to pin this stuff down. Especially since I'm guessing what is and isn't essential to your design in this rough sketch. Thanks [1] http://zfs.informatik.rwth-aachen.de/btw2007/paper/p18.pdf [2] http://www.hpl.hp.com/techreports/2010/HPL-2010-9.pdf -- Peter Geoghegan
Attachment
pgsql-hackers by date: