Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} |
Date | |
Msg-id | CAM3SWZRSH2DqsbPxW0ch7rQbaKQW=OS1z6i=gzWmBKZ=Lk-r+w@mail.gmail.com Whole thread Raw |
In response to | Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} |
List | pgsql-hackers |
V 1.3 is attached. Like Simon, I think that it's premature to commit to one particular value locking implementation. Personally, I think that approach #2 is what we'll end up using, but it makes no sense to not maintain both at once, since it requires relatively little effort to do so. At the very least it's a useful tool for reviewers, who would otherwise be denied the opportunity to test whether any given concurrency problem was attributable to the value locking implementation, or something else. So there are two variants of V 1.3 attached - one uses an unchanged value locking implementation #1, while the other an unchanged implementation #2. Highlights ======== * No more "WITHIN unique_index_name". There is a new syntax that supersedes it. Unique index inference based on columns (or expressions) is *mandatory* for the ON CONFLICT UPDATE variant. It remains optional for the IGNORE variant, because I think someone could very reasonably not care which unique index was implicated. As discussed, this implies that partial unique indexes are no longer supported (but expression indexes work just fine). This may be revisited, but I suggest doing so in a later release. Example of merging on the unique index on column "name": postgres=# explain insert into capitals(name, population, altitude) values ('Riga', 1000, 5) on conflict (name) update set altitude = excluded(altitude); QUERY PLAN ---------------------------------------------------------------------- Insert on capitals (cost=0.00..0.01 rows=1 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) -> Conflict Update on capitals (cost=0.00..1.01 rows=1 width=52) (3 rows) When we cannot infer a unique index, it looks like this: postgres=# explain insert into capitals(name, population, altitude) values ('Riga', 1000, 5) on conflict (population) update set altitude = excluded(altitude); ERROR: 42P10: could not infer which unique index to use from expressions/columns provided for ON CONFLICT LINE 1: ...e, population, altitude) values ('Riga', 1000, 5) on conflic... ^ HINT: Partial unique indexes are not supported. LOCATION: transformConflictClause, parse_clause.c:2407 * No more planner kludges. Index paths are never created for the auxiliary UPDATE plan to begin with. To be tidy, TID scan paths are never added either. Unless the UPDATE will never proceed due to a tautological predicate like "WHERE false", we're guaranteed to have a "sequential scan" in a fairly principled way (as before, we just use this with EvalPlanQual(); it's just an implementation detail). I am not entirely qualified to say so, but I think that this makes my modifications to the optimizer look quite reasonable. The kludge of enforcing various restrictions (e.g. on subqueries appearing in the UPDATE) in the optimizer is also removed. We prefer to enforce everything during parse analysis. This also gets us better error messages, which is nice. * Sane (although limited) support for table inheritance and updatable views. However, in general user-defined rules are unsupported - I cannot see how that could be made sane. The IGNORE variant works for updatable views, and for inheritance relations with children (provided that there is no inference required, which effectively makes the UPDATE variant unsupported). However, both IGNORE and UPDATE variants work for relations that happen to be in an inheritance hierarchy, provided they have no children. I think it's fine to only support the IGNORE variant for relations with inheritance children, because even when users have the "partitioning pattern" use-case, there is no principled way of telling the difference between a vanilla INSERT, and an INSERT with an ON CONFLICT UPDATE clause from within the custom redirection trigger function. Also, unique indexes already only work at the relation level with inheritance - that's a long-standing limitation. And so, in general INSERTs better have the right inheritance child as their target - this is no more and no less true when there is an ON CONFLICT UPDATE clause (note that I'm talking about the "object orientated" inheritance use-case here, and not the "partitioning pattern" use-case - with the latter, it's all up to the trigger function to do the right thing). There may currently be an "UPDATE ONLY", but there is no "INSERT ONLY" - why should I add one? * Both INSERT and UPDATE sets of statement level triggers fire for INSERT with ON CONFLICT UPDATE. The number of rows affected doesn't matter, nor does it matter how they were or were not affected. This certainly seems like the correct behavior. Per-row triggers work the same as before, since far more thought went into their behavior earlier. This incorporates feedback from Kevin. * CONFLICTING() is renamed to EXCLUDED(). I know that some people wanted me to go a different way with this. I think that there are very good practical reasons not to [1], as well as good reasons related to design, but I still accept that CONFLICTING() isn't very descriptive. This spelling seems a lot better. Clean-up ======= If you take a look at the EXPLAIN output above, you'll see that the sequential scan node does not appear. Basically, I'm back to suppressing the implementation detail of that never-executed "sequential scan", but the new approach is far better than earlier approaches. Certain things actually associated with the sequential scan are now attributed to the parent (auxiliary) ModifyTable UPDATE node. Example (incidentally, note that "key" is used to infer a unique index to take as an arbiter index): postgres=# explain insert into upsert values (1000, 'Plucky') on conflict (key) update set val = excluded(val) where key = 5; QUERY PLAN --------------------------------------------------------------------- Insert on upsert (cost=0.00..0.01 rows=1 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) -> Conflict Update on upsert (cost=0.00..25.38 rows=6 width=36) Filter: (key = 5) (4 rows) This seems reasonable to me. Including the "(never executed)" sequential scan would be very confusing to users. Inference ======= Unique index inference (i.e. the way we figure out *which* unique index to use) occurs during parse analysis. I think it would be inappropriate, and certainly inconvenient to do it during planning. I maintain that ON CONFLICT DML statements have a legitimate semantic dependence on particular unique indexes, which makes this appropriate. I don't know about others, but my only problem with naming unique indexes directly was that it is unmaintainable, and very ugly. In principle, I think this semantic dependence is quite reasonable, and reflects the reality of how we all expect this to work. There are comments on the implications for plan caching and how that relates to where and when we perform unique index inference. Documentation =========== The documentation has been updated, incorporating feedback. I also made the cardinality violation error a lot clearer than before, since Craig said that was unclear. Tests ==== Many tests were added. I've added a couple of new isolation tests. insert-conflict-update-3 should be of particular interest. That test illustrates the visibility issues with the WHERE clause that I've already highlighted as a possible concern [2]. Unique index inference tests will also give you a fair idea of how flexible it is. Remaining open items ================= Apart from the obvious issue of value locking (i.e. verifying the correctness of its implementation in general), the only open items are: * RLS needs to be considered. I have yet to give it any real thought. * I could probably do better at postgres_fdw support. That seems like something that could be followed up on later, because it's clearly just about a Simple Matter of Programming. In summary, I was able to remove a lot of TODO/FIXME items here - almost all of them. I'm pretty happy about that. I'll have to edit the UPSERT wiki page, to strike out many open items... [1] http://www.postgresql.org/message-id/CAM3SWZQhiXQi1osT14V7spjQrUpmcnRtbXJe846-EB1bC+9i1g@mail.gmail.com [2] https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29 -- Peter Geoghegan
Attachment
pgsql-hackers by date: