Syntax of INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Syntax of INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Date | |
Msg-id | CAM3SWZRzWH5gbAVoDHcCovf75Pjz9GsOhDOZBA1KNM4S2jgcSQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Syntax of INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Re: Syntax of INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
List | pgsql-hackers |
Someone suggested to me that I solicit opinion on the chosen syntax of INSERT...ON DUPLICATE KEY LOCK FOR UPDATE on a separate thread. I'll do that here, while also giving a formal user-level definition of the feature. I'd like to solicit feedback from a wider set of people than those participating in the main thread, while avoiding talking about arcane details around locking which have dominated discussions up until this point. The patch proposed adds an optional ON DUPLICATE KEY LOCK FOR UPDATE clause. It also extends the RETURNING clause to support projecting "REJECTS". Idiomatic wCTE usage, through which it's possible to build an "UPSERT", might look like: CREATE TABLE foo ( a int4 PRIMARY KEY, b int4, c text ); WITH rej AS ( INSERT INTO foo (a, b, c) -- Multiple rows proposed for insertion - may insert or update each VALUES (44, 1078, 'insert'), (55, 1088, 'insert') ON DUPLICATE KEY LOCK FOR UPDATE RETURNING REJECTS a) UPDATE foo SET c = 'update' FROM rej WHERE foo.a = rej.a; This has what I like to call the fundamental upsert property: You always either get an insert, or an update. People are already incorrectly using wCTEs like this; I thought I'd make it work correctly. I believe that we'll be well served by the flexibility of what I've proposed. In many respects it is similar to SQL MERGE. For example, I could have deleted rather than updated, or I could have had a more complex predicate in the update clause, so that updates did not occur on all rejecting rows. While I'm not particularly attached to the syntax, I do think that the flexibility is a key strength. I recently suggested that rather than RETURNING REJECTS, we could have a REJECTING clause, which would see a DML statement project strictly the complement of what RETURNING projects in the same context. So perhaps you could also see what RETURNING would not have projected because a before row trigger returned NULL (i.e. when a before trigger indicates to not proceed with insertion). That is certainly more general, and so is perhaps preferable. It's also less verbose, and it seems less likely to matter that we'll need to make REJECTING a fully reserved keyword, as compared to REJECTS. (RETURNING is already a fully reserved keyword not described by the standard, so this makes a certain amount of sense to me). If nothing else, REJECTING is more terse than RETURNING REJECTS. Like the MySQL feature "INSERT...ON DUPLICATE KEY UPDATE", duplicates are defined as would-be violators of any and all unique indexes. Like the MySQL feature, the user is generally obligated to make sure they know ahead of time which unique index any violation may come from, or else they may update the wrong row (i.e. the row they ultimately update is not already locked). Unlike the MySQL feature, expert users have some capacity to recover if that problem is anticipated, because we also may project out the rejecting row's ctid. Notably the wCTE pattern, with a ctid join condition doesn't work very well, because, as src/backend/optimizer/path/tidpath.c says: * There is currently no special support for joins involving CTID; in* particular nothing corresponding to best_inner_indexscan(). Since it's* not very useful to store TIDs of one table in another table, there* doesn't seem tobe enough use-case to justify adding a lot of code* for that. You end up with a seqscan, not a tidscan, so I don't think every novice user is going to try this as a premature optimization, without appreciating the hazards of tid updates. Projecting the rejecting row's tid is an expert level feature, added mostly with things like multi-master replication conflict resolution in mind. Those use-cases will find this feature quite important, and will particularly value the flexibility. In fact, I think for that use-case, it's even more useful than SQL MERGE for a couple of reasons, in particular the capability to defer doing anything with the locked row until later commands. This feature is not supposed to fully satisfy those calling for SQL MERGE. I anticipate that we'll still get MERGE in a future release, and the implementation anticipates this as well. More formally, what the feature does is: * Ensure that a row is either inserted successfully, or that if an effort to do so was unsuccessful, the first conclusively committed tuple with a conflicting value is exclusive locked. Not all conflicting tuples are locked, just the first, although the order in which we check unique indexes for conflicts is well defined (at least with my implementation, where when we find a conflicting TID, it must really be the first one at that juncture, because all previous unique indexes are value locked). * Ensure that in READ COMMITTED mode, the locked row is always visible. Since we can get a lock on the tuple blamed for insertion not proceeding, it must be conclusively committed and not updated or deleted by anyone else, but in and of itself that isn't sufficient. The tuple's xact may logically be still-in-progress to our snapshot, and it would not be acceptable to have it be impossible to update for that reason. So there is a special case adjustment to the general semantics of MVCC snapshots, just to facilitate this feature. Higher isolation levels actively forbid themselves from proceeding when they observe that they'll have to avail of this special rule to update, by throwing a serialization failure. -- Peter Geoghegan
pgsql-hackers by date: