Thread: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Attached WIP patch extends the INSERT statement, adding a new ON CONFLICT {UPDATE | IGNORE} clause. This allows INSERT statements to perform UPSERT operations (if you want a more formal definition of UPSERT, I refer you to my pgCon talk's slides [1], or the thread in which I delineated the differences between SQL MERGE and UPSERT [2]). The patch builds on previous work in this area, and incorporates feedback from Kevin and Andres. Overview ======= Example usage: INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT UPDATE SET val = 'update'; Essentially, the implementation has all stages of query processing track some auxiliary UPDATE state. So, for example, during parse analysis, UPDATE transformation occurs in an ad-hoc fashion tightly driven by the parent INSERT, but using the existing infrastructure (i.e. transformStmt()/transformUpdateStmt() is called, and is insulated from having to care about the feature as a special case). There are some restrictions on what this auxiliary update may do, but FWIW there are considerably fewer than those that the equivalent MySQL or SQLite feature imposes on their users. All of the following SQL queries are valid with the patch applied: -- Nesting within wCTE: WITH t AS ( INSERT INTO z SELECT i, 'insert' FROM generate_series(0, 16) i ON CONFLICT UPDATE SET v = v || 'update' -- use of operators/functions in targetlist RETURNING * -- only projects inserted tuples, never updated tuples ) SELECT * FROM t JOIN y ON t.k = y.a ORDER BY a, k; -- IGNORE variant: INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT IGNORE; -- predicate within UPDATE auxiliary statement (row is still locked when the UPDATE predicate isn't satisfied): INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT UPDATE WHERE val != 'delete'; As with SQL MERGE (at least as implemented in other systems), subqueries may not appear within the UPDATE's targetlist, nor may they appear within the special WHERE clause. But the "INSERT part" of the query has no additional limitations, so you may for example put subqueries within a VALUES() clause, or INSERT...SELECT...ON CONFLICT UPDATE... just as you'd expect. INSERT has been augmented with a new clause, but that clause does not unreasonably fail to play nice with any other aspect of insertion. (Actually, that isn't quite true, since at least for now table inheritance, updatable views and foreign tables are unsupported. This can be revisited.) I think that without initially realizing it, I copied the SQLite syntax [3]. However, unlike with that SQLite feature, CONFLICT only refers to a would-be duplicate violation, and not a violation of any other kind of constraint. How this new approach works (Executor and Optimizer stuff) ============================================ During the execution of the parent ModifyTable, a special auxiliary subquery (the UPDATE ModifyTable) is considered as a special case. This is not a subplan of the ModifyTable node in the conventional sense, and so does not appear within EXPLAIN output. However, it is more or less independently planned, and entirely driven by the INSERT ModifyTable. ExecModifyTable() is never called with this special auxiliary plan state passed directly. Rather, its parent manages the process as the need arises. ExecLockUpdateTuple() locks and potentially updates tuples, using the EvalPlanQual() mechanism (even at higher isolation levels, with appropriate precautions). The per-tuple expression context of the auxiliary query/plan is used with EvalPlanQual() from within ExecLockUpdateTuple() (the new routine tasked with locking and updating on conflict). There is a new ExecUpdate() call site within ExecLockUpdateTuple(). Given the restrictions necessarily imposed on this pseudo-rescanning (principally the outright rejection of anything that necessitates PARAM_EXEC parameters during planning), this is safe, as far as I'm aware. It is convenient to be able to re-use infrastructure in such a way as to more or less handle the UPDATE independently, driven by the INSERT, except for execution which is more directly handled by the INSERT (i.e. there is no ExecModifyTable() call in respect of this new auxiliary ModifyTable plan). Granted, it is kind of bizarre that the auxiliary query may have a more complex plan than is necessary for our purposes, but it doesn't actually appear to be a problem when "rescanning" (Like a SELECT FOR UPDATE/FOR SHARE's node, we call EvalPlanQualSetTuple() directly). It is likely worthwhile to teach the optimizer that we really don't care about how the one and only base rel within the UPDATE auxiliary subquery (the target table) is scanned, if only to save a few cycles. I have (temporarily) hacked the optimizer to prevent index-only scans, which are problematic here, by adding disable_cost when a query parse tree that uses the feature is seen. Although what I've done is a temporary kludge, the basic idea of forcing a particular type of relation scan has a precedent: UPDATE WHERE CURRENT OF artificially forces a TID scan, because only a TID scan will work correctly there. I couldn't come up with a convenient way to artificially inject disable_cost into alternative scan types, in the less invasive style of isCurrentOf, because there is no convenient qual to target within cost_qual_eval(). As in previous incarnations, we lock each tuple (although, of course, only with the UPDATE variant). We may or may not also actually proceed with the update, depending on whether or not the user-specified special update predicate (if any) is satisfied. But if we do, EvalPlanQual() is (once the tuple is locked) only ever evaluated on a conclusively committed and locked-by-us conflict tuple as part of the process of updating, even though it's possible for the UPDATE predicate to be satisfied where conceivably it would not be satisfied by the tuple version actually visible to the command's MVCC snapshot. I think this is the correct behavior. We all seem to be in agreement that we should update at READ COMMITTED if *no* version of the tuple is visible. It seems utterly arbitrary to me to suggest that on the one hand it's okay to introduce one particular "MVCC violation", but not another equivalent one. The first scenario is one in which we update despite our update's (or rather insert's) "predicate" not being satisfied (according to our MVCC snapshot). The second scenario is one in which the same "predicate" is also not satisfied according to our MVCC snapshot, but in a slightly different way. Why bother introducing a complicated distinction, if it's a distinction without a difference? I'd rather have a behavior that is consistent, easy to reason about, and easy to explain. And so, the predicate is considered once, after conclusively locking a conflict tuple. It feels natural and appropriate to me that if the special UPDATE qual isn't satisfied, we still lock the tuple. After all, in order to make a conclusive determination about the qual not being satisfied, we need to lock the tuple. This happens to insulate ExecUpdate() from having to care about "invisible tuples", which are now possible (although we still throw an error, just with a useful error message that phrases the problem in reference to this new feature). Of course, at higher isolation levels serialization errors are thrown when something inconsistent with the higher level's guarantees would otherwise need to occur (even for the IGNORE variant). Still, interactions with SSI, and preserving the guarantees of SSI should probably be closely considered by a subject matter expert. Omission ======= The patch currently lacks a way of referencing datums rejected for insertion when updating. The way MySQL handles the issue seems questionable. They allow you to do something like this: INSERT INTO upsert (key, val) VALUES (1 'val') ON DUPLICATE KEY UPDATE val = VALUES(val); The implication is that the updated value comes from the INSERT's VALUES() list, but emulating that seems like a bad idea. In general, at least with Postgres it's entirely possible that values rejected differ from the values appearing in the VALUES() list, due to the effects of before triggers. I'm not sure whether or not we should assume equivalent transformations during any UPDATE before triggers. This is an open item. I think it makes sense to deal with it a bit later. "Value locking" =========== To date, on-list discussion around UPSERT has almost exclusively concerned what I've called "value locking"; the idea of locking values in unique indexes in the abstract (to establish the right to insert ahead of time). There was some useful discussion on this question between myself and Heikki back around December/January. Ultimately, we were unable to reach agreement on an approach and discussion tapered off. However, Heikki did understand the concerns that informed by design. He recognized the need to be able to easily *release* value locks, so as to avoid "unprincipled deadlocks", where under high concurrency there are deadlocks between sessions that only UPSERT a single row at a time. I'm not sure how widely appreciated this point is, but I believe that Heikki appreciates it. It is a very important point in my opinion. I don't want an implementation that is in any way inferior to the "UPSERT looping subxact" pattern does (i.e. the plpsql thing that the docs suggest). When we left off, Heikki continued to favor an approach that involved speculatively inserting heap tuples, and then deleting them in the event of a conflict. This design was made more complicated when the need to *release* value locks became apparent (Heikki ended up making some changes to HeapTupleSatisfiesDirty(), as well as sketching a design for what you might call a "super delete", where xmin can be set to InvalidTransactionId for speculatively-inserted heap tuples). After all, it wasn't as if we could abort a subxact to release locks, which is what the "UPSERT looping subxact" pattern does. I think it's fair to say that that design became more complicated than initially anticipated [4] [5]. Anyway, the greater point here is that fundamentally, AFAICT Heikki and I were in agreement. Once you buy into the idea that we must avoid holding on to "value locks" of whatever form - as Heikki evidently did - then exactly what form they take is ultimately only a detail. Granted, it's a very important detail, but a detail nonetheless. It can be discussed entirely independently of all of this new stuff, and thank goodness for that. If anyone finds my (virtually unchanged) page heavyweight lock based value locking approach objectionable, I ask that the criticism be framed in a way that makes a sharp distinction between each of the following: 1. You don't accept that value locks must be easily released in the event of a conflict. Is anyone in this camp? It's far from obvious to me what side of this question Andres is on at this stage, for example. Robert might have something to say here too. 2. Having taken into account the experience of myself and Heikki, and all that is implied by taking that approach ***while avoiding unprincipled deadlocks***, you continue to believe that an approach based on speculative heap insertion, or some alternative scheme is better than what I have done to the nbtree code here, or you otherwise dislike something about the proposed value locking scheme. You accept that value locks must be released and released easily in the event of a conflict, but like Heikki you just don't like what I've done to get there. Since we can (I believe) talk about the value locking aspect and the rest of the patch independently, we should do so...unless you're in camp 1, in which case I guess that we'll have to thrash it out. Syntax, footguns ============= As I mentioned, I have incorporated feedback from Kevin Grittner. You may specify a unique index to merge on from within the INSERT statement, thus avoiding the risk of inadvertently having the update affect the wrong tuple due to the user failing to consider that there was a would-be unique violation within some other unique index constraining some other attribute. You may write the DML statement like this: INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN upsert_pkey UPDATE SET val = 'update'; I think that there is a good chance that at least some people will want to make this mandatory. I guess that's fair enough, but I *really* don't want to *mandate* that users specify the name of their unique index in DML for obvious reasons. Perhaps we can come up with a more tasteful syntax that covers all interesting cases (consider the issues with partial unique indexes and before triggers for example, where a conclusion reached about which index to use during parse analysis may subsequently be invalidated by user-defined code, or ambiguous specifications in the face of overlapping attributes between two unique composite indexes, etc). The Right Thing is far from obvious, and there is very little to garner from other systems, since SQL MERGE promises essentially nothing about concurrency, both as specified by the standard and in practice. You don't need a unique index at all, and as I showed in my pgCon talk, there are race conditions even for a trivial UPSERT operations in all major SQL MERGE implementations. Note that making mandatory (via syntax) merging on one particular unique index buys the implementation no useful leeway. Just for example, the unprincipled deadlocks test case that illustrated the problem with early "promise tuple" style approaches to value locking [6] involved only a single unique index. AFAICT, the question of whether or not this should be mandatory is just a detail of the feature's high level design, as opposed to something expected to significantly influence the implementation. Testing, performance =============== As you'd expect, I've included both isolation tests and regression tests covering a reasonable variety of cases. In addition, stress testing is an important part of my testing strategy. Reviewers are encouraged to try out these test bash scripts: https://github.com/petergeoghegan/upsert (Interested hackers should request collaborator status on that Github project from me privately. I welcome new, interesting test cases.) The performance of the patch seems quite good, and is something that these stress-testing bash scripts also test. Upserts are compared against "equivalent" inserts when we know we'll never update, and against "equivalent" updates when we know we'll never insert. On an 8 core test server, I can sustain ~90,000 ordinary insert transactions per second on an unlogged table defined as follows: create unlogged table foo ( merge serial primary key, b int4, c text ); In all cases pgbench uses 8 clients (1 per CPU core). With "equivalent" upserts, it's about ~66,000 TPS. But this is a particularly unsympathetic case, because I've deliberately exaggerated the effects of heavyweight lock contention on leaf pages by using a serial primary key. Plus, there's the additional planning and parsing overhead. When comparing updating with updating upserting, it's a similar story. 100,000 tuples are pre-inserted in each case. I can sustain ~98,000 TPS with plain updates, or ~70,000 TPS with "equivalent" upserts. B-Tree index page heavyweight lock contention probably explains some of the difference between "UPSERT inserts" and "UPSERT updates". Interlocking with VACUUM, race conditions =============================== In previous revisions, when we went to lock + update a tuple, no "value locks" were held, and neither were any B-Tree page buffer pins, because they were both released at the same time (recall that I call my heavyweight lock on B-Tree leaf pages a value lock). We still do that (unprincipled deadlocks are our only alternative), but now hold on to the pin for longer, until after tuple locking. Old versions of this patch used to sit on the B-Tree buffer pin to prevent concurrent deletion only as long as value locks were held, but maybe it isn't good enough to sit on the pin until before we lock/update, as value locks are released: dropping the pin implies that the heap tuple can physically go away, and in general the same TID may then contain anything. We may have to interlock against vacuum by sitting on the B-Tree buffer pin (but not the value lock) throughout locking + update. That makes it impossible for the heap tuple slot to fail to relate to the tuple from the B-Tree, that is under consideration for locking/updating. Recall that we aren't quite dealing with MVCC semantics here, since in READ COMMITTED mode we can lock a conclusively committed + visible tuple with *no* version visible to our command's MVCC snapshot. Therefore, it seems worth considering the possibility that the nbtree README's observations on the necessity of holding a pin to interlock against VACUUM (for non-MVCC snapshots) apply. In this revision we have two callbacks (or two calls to the same callback, with different effects): One to release value locks early, to avoid unprincipled deadlocks, and a second to finally release the last unneeded buffer pin. Recall that when we find a conflict (within _bt_checkunique()), it must be conclusively committed and visible to new MVCC snapshots; we know at that juncture that it's live. The concern is that it might be deleted *and* garbage collected in the interim between finding the conflict tuple, and locking it (in practice this interim period is only an instant). This is probably too paranoid, though: the fact that the upserter's transaction is running ought to imply that GetOldestXmin() returns an XID sufficient to prevent this. OTOH, I'm not sure that there exists anything that looks like a precedent for relying on blocking vacuum in this manner, and it might turn out to be limiting to rely on this. And, I hasten to add, my fix (sitting on a B-Tree pin throughout row locking) is in another way perhaps not paranoid enough: Who is to say that our conflicting value is on the same B-Tree leaf page as our value lock? If might not be, since _bt_checkunique() looks at later B-Tree pages (the value locked page is merely "the first leaf page the value could be on"). Pinning the heavyweight lock page's buffer is certainly justified by the need for non-speculative inserters to see a flag that obligates them to acquire the heavyweight page lock themselves (see comments in patch for more), but this other reason is kind of dubious. In other words: I'm relying on the way VACUUM actually works to prevent premature garbage collection. It's possible to imagine a world in which HeapTupleSatisfiesVacuum() is smart enough to realize that the tuple UPSERT wants to lock is not visible to anyone (assuming MVCC semantics, etc), and never can be. I've tentatively added code to keep a buffer pin for longer, but that's probably not good enough if we assume that it's necessary at all. Basically, I want to be comfortable about my rationale for it being okay that a "non-MVCC" "index scan" doesn't hold a pin, but right now I'm not. I was conflicted on whether or not I should include the "unpin later" logic at all; for now I've left it in, if only as a placeholder. Needless to say, if there is a race condition you can take it that it's very difficult to isolate. FWIW, somewhat extensive stress-testing has revealed no bugs that you might associate with these problems, with and without extended buffer pinning, and with artificial random sleeps added at key points in an effort to make any race condition bugs manifest themselves. I have made a concerted effort to break the patch in that way, and I'm now running out of ideas. Running the stress tests (with random delays in key points in the code) for several days reveals no bugs. This is on the same dedicated 8 core server, with plenty of concurrency. It's probably a good idea to begin using my B-Tree verification tool [7] for testing...on the other hand, it doesn't know anything about MVCC, and will only detect the violation of invariants that are localized to the B-Tree code, at least at the moment. Open items ========= I already mentioned the inability to reference rejected rows in an UPDATE, as well as my unease about VACUUM interlocking, both of which are open item. Also, some of the restrictions that I already mentioned - on updatable views, inheritance, and foreign tables - are probably unnecessary. We should be able to come with reasonable behavior for at least some of those. Patch ==== I thought that I went too long without posting something about all of this to the list to get feedback, and so I decided to post this WIP patch set. I've tried to break it up into pieces, but it isn't all that suitable for representing as cumulative commits. I've also tried to break up the discussion usefully (the question of how everything fits together at a high level can hopefully be discussed separately from the question of how "value locks" are actually implemented). Thoughts? [1] http://www.pgcon.org/2014/schedule/attachments/327_upsert_weird.pdf, ("Goals for UPSERT in Postgres") [2] http://www.postgresql.org/message-id/CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com [3] https://sqlite.org/lang_conflict.html [4] http://www.postgresql.org/message-id/CAM3SWZQoArVQGMi=v-jk3sBjsPg+wdjeUkM_6L5TZG_i9pyGzQ@mail.gmail.com [5] http://www.postgresql.org/message-id/52B4AAF0.5090806@vmware.com [6] http://www.postgresql.org/message-id/CAM3SWZShbE29KpoD44cVc3vpZJGmDer6k_6FGHiSzeOZGmTFSQ@mail.gmail.com [7] http://www.postgresql.org/message-id/CAM3SWZRtV+xmRWLWq6c-x7czvwavFdwFi4St1zz4dDgFH4yN4g@mail.gmail.com -- Peter Geoghegan
Attachment
On 08/28/2014 04:43 AM, Peter Geoghegan wrote: > -- Nesting within wCTE: > WITH t AS ( > INSERT INTO z SELECT i, 'insert' > FROM generate_series(0, 16) i > ON CONFLICT UPDATE SET v = v || 'update' -- use of > operators/functions in targetlist > RETURNING * -- only projects inserted tuples, never updated tuples > ) > SELECT * FROM t JOIN y ON t.k = y.a ORDER BY a, k; Personally I would find it surprising if RETURNING did not also return the updated tuples. In many use cases for upsert the user does not care if the row was new or not. What I think would be useful is if all tuples were returned but there was some way to filter out only the inserted ones. -- Andreas Karlsson
On Thu, Aug 28, 2014 at 7:29 AM, Andreas Karlsson <andreas@proxel.se> wrote: > Personally I would find it surprising if RETURNING did not also return the > updated tuples. In many use cases for upsert the user does not care if the > row was new or not. I'm not attached to that particular behavior, but it does seem kind of similar to the behavior of BEFORE triggers, where a NULL return value ("do nothing") will also cause RETURNING to not project the tuple. -- Peter Geoghegan
On 08/28/2014 09:05 PM, Peter Geoghegan wrote: > On Thu, Aug 28, 2014 at 7:29 AM, Andreas Karlsson <andreas@proxel.se> wrote: >> Personally I would find it surprising if RETURNING did not also return the >> updated tuples. In many use cases for upsert the user does not care if the >> row was new or not. > > I'm not attached to that particular behavior, but it does seem kind of > similar to the behavior of BEFORE triggers, where a NULL return value > ("do nothing") will also cause RETURNING to not project the tuple. I see. So we have three cases where we may or may not want to project a tuple. 1) The tuple was inserted 2) We got a conflict and updated the tuple 3) We got a conflict but skipped updating the tuple My personal intuition was that (1) and (2) would be returned but not (3). But I am not sure if that is the most useful behavior. -- Andreas Karlsson
On Wed, Aug 27, 2014 at 7:43 PM, Peter Geoghegan <pg@heroku.com> wrote: > There are some restrictions on what this auxiliary update may do, but > FWIW there are considerably fewer than those that the equivalent MySQL > or SQLite feature imposes on their users. I realized that I missed a few cases here. For one thing, the posted patch fails to arrange for the UPDATE post-parse-analysis tree representation to go through the rewriter stage (on the theory that user-defined rules shouldn't be able to separately affect the auxiliary UPDATE query tree), but rewriting is at least necessary so that rewriteTargetListIU() can expand a "SET val = DEFAULT" targetlist, as well as normalize the ordering of the UPDATE's tlist. Separately, the patch fails to defend against certain queries that ought to be disallowed, where a subselect is specified with a subquery expression in the auxiliary UPDATE's WHERE clause. These are garden-variety bugs that aren't likely to affect the kind of high-level design discussion that I'm looking for here. I'll post a fixed version in a few days time. -- Peter Geoghegan
On Thu, Aug 28, 2014 at 8:05 PM, Peter Geoghegan <pg@heroku.com> wrote: > I realized that I missed a few cases here. For one thing, the posted > patch fails to arrange for the UPDATE post-parse-analysis tree > representation to go through the rewriter stage (on the theory that > user-defined rules shouldn't be able to separately affect the > auxiliary UPDATE query tree), but rewriting is at least necessary so > that rewriteTargetListIU() can expand a "SET val = DEFAULT" > targetlist, as well as normalize the ordering of the UPDATE's tlist. > Separately, the patch fails to defend against certain queries that > ought to be disallowed, where a subselect is specified with a subquery > expression in the auxiliary UPDATE's WHERE clause. Attached revision fixes all of these issues. I've added regression tests for each bug, too, although all changes are rebased into my original commits. I decided to explicitly rely on a simpler approach to VACUUM interlocking. I no longer bother holding on to a buffer pin for a period longer than the period that associated "value locks" are held, which was something I talked about at the start of this thread. There is a note on this added to the nbtree README, just after the master branch's current remarks on B-Tree VACUUM interlocking. I've also pushed the responsibility for supporting this new feature on foreign tables onto FDWs themselves. The only writable FDW we currently ship, postgres_fdw, lacks support for the new feature, but this can be revisited in due course. My impression is that the task of adding support is not quite a straightforward matter of adding a bit more deparsing logic, but also isn't significantly more difficult than that. -- Peter Geoghegan
Attachment
On Wed, Aug 27, 2014 at 10:43 PM, Peter Geoghegan <pg@heroku.com> wrote: > Example usage: > > INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT UPDATE > SET val = 'update'; I think that syntax is a dramatic improvement over your previous proposals. The only part I don't entire like is this: > INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN > upsert_pkey UPDATE SET val = 'update'; It seems to me that it would be better to specify a conflicting column set rather than a conflicting index name. I don't have much in the way of comments about the implementation, at least not right at the moment, but... > Essentially, the implementation has all stages of query processing > During the execution of the parent ModifyTable, a special auxiliary > subquery (the UPDATE ModifyTable) is considered as a special case. > This is not a subplan of the ModifyTable node in the conventional > sense, and so does not appear within EXPLAIN output. ...that sounds wonky. > I already mentioned the inability to reference rejected rows in an > UPDATE, as well as my unease about VACUUM interlocking, both of which > are open item. Also, some of the restrictions that I already mentioned > - on updatable views, inheritance, and foreign tables - are probably > unnecessary. We should be able to come with reasonable behavior for at > least some of those. If you've noted my comments on the UPDATE/DELETE .. ORDER BY .. LIMIT thread, you won't be surprised to hear that I think those restrictions will need to be lifted - especially for inheritance, but probably the others as well. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 3, 2014 at 9:51 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Essentially, the implementation has all stages of query processing >> During the execution of the parent ModifyTable, a special auxiliary >> subquery (the UPDATE ModifyTable) is considered as a special case. >> This is not a subplan of the ModifyTable node in the conventional >> sense, and so does not appear within EXPLAIN output. > > ...that sounds wonky. Which part? It certainly wouldn't be helpful if the (say) auxiliary plan's "sequential scan" appeared within EXPLAIN output. That's just an implementation detail. Note that the structure of the plan is highly restricted, since it needs to be "driven by the insert" (or, rather, the insert's conflicts, including conflicts not visible to the command's MVCC snapshot). There won't be any interesting variation in the plan. Although, that said, the implementation should probably display any "Filter: ..." conditions implied by the special UPDATE qual. > If you've noted my comments on the UPDATE/DELETE .. ORDER BY .. LIMIT > thread, you won't be surprised to hear that I think those restrictions > will need to be lifted - especially for inheritance, but probably the > others as well. Sure. -- Peter Geoghegan
On Wed, Sep 3, 2014 at 9:51 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN >> upsert_pkey UPDATE SET val = 'update'; > > It seems to me that it would be better to specify a conflicting column > set rather than a conflicting index name. I'm open to pursuing that, provided there is a possible implementation that's robust against things like BEFORE triggers that modify constrained attributes. It must also work well with partial unique indexes. So I imagine we'd have to determine a way of looking up the unique index only after BEFORE triggers fire. Unless you're comfortable with punting on some of these cases by throwing an error, then all of this is actually surprisingly ticklish. You've already expressed concerns about the feature not playing nice with existing, peripheral features though. -- Peter Geoghegan
On Wed, Sep 3, 2014 at 2:13 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Sep 3, 2014 at 9:51 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> Essentially, the implementation has all stages of query processing >>> During the execution of the parent ModifyTable, a special auxiliary >>> subquery (the UPDATE ModifyTable) is considered as a special case. >>> This is not a subplan of the ModifyTable node in the conventional >>> sense, and so does not appear within EXPLAIN output. >> >> ...that sounds wonky. > > Which part? It certainly wouldn't be helpful if the (say) auxiliary > plan's "sequential scan" appeared within EXPLAIN output. That's just > an implementation detail. Note that the structure of the plan is > highly restricted, since it needs to be "driven by the insert" (or, > rather, the insert's conflicts, including conflicts not visible to the > command's MVCC snapshot). There won't be any interesting variation in > the plan. Although, that said, the implementation should probably > display any "Filter: ..." conditions implied by the special UPDATE > qual. I think there shouldn't be any plan nodes in the system that don't get displayed by explain. If you're using a plan node for something, and think it shouldn't be displayed by explain, then either (1) you are wrong or (2) you are abusing the plan node. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 4, 2014 at 8:03 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I think there shouldn't be any plan nodes in the system that don't get > displayed by explain. If you're using a plan node for something, and > think it shouldn't be displayed by explain, then either (1) you are > wrong or (2) you are abusing the plan node. Maybe. I admit that I'm not entirely confident that the representation of the auxiliary state during planning and execution is ideal. However, it sure is convenient to be able to separately plan the auxiliary query as a subquery, and not have to specially fish it out of the subplan list later. Maybe we should add a mechanism that essentially generates an equivalent, single ModifyTable plan. Or maybe that would be adding a lot of code for no tangible benefit. I don't see much point in making one ModifyTable node pull up from the other for the benefit of this feature (which is another thing entirely to having there be a single ModifyTable plan). For now, I'm glad to have something that will allow us to drive discussion of the feature to the next level. I don't have a good enough understanding of the optimizer to be able to say with confidence what we should do, or to be able to see the big picture of making any particular trade-off. It's not an immediate concern, though. -- Peter Geoghegan
On Thu, Sep 4, 2014 at 11:55 AM, Peter Geoghegan <pg@heroku.com> wrote: > It's not an immediate concern, though. My immediate concern is to get some level of buy-in about how everything fits together at a high level. Separately, as discussed in my opening mail, there is the question of how value locking should ultimately be implemented. These are two orthogonal questions, or are pretty close to orthogonal. That helps. It also helps that people have stopped being confused by the term "value locking" (I think). I'm tempted to believe that the silence on the question of how things fit together (such as the lack of discussion of my pgCon talk's characterization of a "pick any 2" trade-off) means that that's because everyone agrees with that. That seems pretty naive, though, because a lot of the issues are very subtle. I think that various interested people, including Robert and Andres have yet to make their minds up on that. I'm not sure what Tom thinks of it. -- Peter Geoghegan
On Wed, Aug 27, 2014 at 7:43 PM, Peter Geoghegan <pg@heroku.com> wrote: > Omission > ======= > > The patch currently lacks a way of referencing datums rejected for > insertion when updating. Attached revision of the patch set (which I'll call v1.2) adds this capability in a separate commit. It now becomes possible to add a CONFLICTING expression within the ON CONFLICT UPDATE targetlist or predicate. Example use: """ postgres=# CREATE TABLE upsert(key int4 PRIMARY KEY, val text); CREATE TABLE postgres=# INSERT INTO upsert VALUES(1, 'Giraffe'); INSERT 0 1 postgres=# SELECT * FROM upsert; key | val -----+--------- 1 | Giraffe (1 row) postgres=# INSERT INTO upsert VALUES(1, 'Bear'), (2, 'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val); INSERT 0 1 postgres=# SELECT * FROM upsert; key | val -----+------ 1 | Bear 2 | Lion (2 rows) """ Note that the effects of BEFORE INSERT triggers are carried here, which I slightly favor over the alternative of not having it work that way. I've also expanded upon my explanation for the structure of the query tree and plan within (revised/rebased versions of) earlier commits. I am clearer on why there is a special subquery planning step for the auxiliary UPDATE, rather than making the UPDATE directly accessible as a subquery within the post-parse-analysis query tree. Basically, the optimizer has no basis for understanding that a DML sublink isn't optimizable. It'll try to pull-up the subquery and so on, which of course does not and cannot work. Whereas treating it as an independently planned subquery of the top-level query, kind of like a data-modifying CTE makes sense (with such CTEs, the executor is prepared for the possibility that not all rows will be pulled up - so there too, the executor drives execution more directly than makes sense when not dealing with DML: it finishes off the data-modifying CTE's DML for any still-unconsumed tuples, within ExecPostprocessPlan()). It's certainly possible that a more unified representation makes sense (i.e. one ModifyTable plan, likely still having seperate INSERT/UPDATE representations at earlier stages of query processing), but that would require serious refactoring of the representation of ModifyTable operations -- just for example, consider the need for a unified-though-separate targetlist, one for the INSERT part, the other for the UPDATE part. For now, I continue to find it very convenient to represent the UPDATE as a selectively executed, auxiliary, distinct ModifyTable plan, rather than adding a subquery rangetable directly during parse analysis. There is another significant change. In this revision, I am at least "honest" about the plan representation within EXPLAIN: """ postgres=# EXPLAIN ANALYZE INSERT INTO upsert VALUES(1, 'Bear'), (2, 'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val); QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Insert on upsert (cost=0.00..0.03 rows=2 width=36) (actual time=0.115..0.115 rows=0 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=36) (actual time=0.003..0.005 rows=2 loops=1) -> Conflict Update on upsert (cost=0.00..22.30 rows=1230 width=36) (actual time=0.042..0.051 rows=0 loops=1) -> Seq Scan on upsert (cost=0.00..22.30 rows=1230 width=36) (never executed) Planning time: 0.065 ms Execution time: 0.158 ms (6 rows) postgres=# EXPLAIN ANALYZE INSERT INTO upsert VALUES(1, 'Bear'), (2, 'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val) where key = 2; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Insert on upsert (cost=0.00..0.03 rows=2 width=36) (actual time=0.075..0.075 rows=0 loops=1) -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=36) (actual time=0.001..0.002 rows=2 loops=1) -> Conflict Update on upsert (cost=4.16..8.17 rows=1 width=36) (actual time=0.012..0.026 rows=0 loops=1) -> Bitmap Heap Scan on upsert (cost=4.16..8.17 rows=1 width=36) (never executed) Recheck Cond: (key = 2) -> Bitmap Index Scan on upsert_pkey (cost=0.00..4.16 rows=1 width=0) (never executed) Index Cond: (key = 2) Planning time: 0.090 ms Execution time: 0.125 ms (9 rows) """ The second query gets a bitmap scan because plain index scans have been disabled for the UPDATE (a temporary kludge), since index-only scans can break things - IndexOnlyRecheck() throws an error. Not quite sure why the optimizer doesn't care about resjunk for the UPDATE, which is presumably why in general regular updates never use index-only scans. Since I think the actual auxiliary plan generation needs work, so as to not have uselessly complicated plans, I didn't try too hard to figure that out. This plan structure is not acceptable, of course, but maybe almost the same thing would be acceptable if the auxiliary plan shown here wasn't unnecessarily complex - if we forced a simple pseudo-scan placeholder, without wasting optimizer cycles, somewhat in the style of WHERE CURRENT OF. This is something discussed in newly expanded comments within planner.c. I would have made the optimizer produce a suitably simple plan myself, but I don't have a good enough understanding of it to figure out how (at least in a reasonable amount of time). Pointers on how this might be accomplished are very welcome. With this addition, the feature is functionally complete. That just leaves the small matter of how it has been implemented. :-) This is still clearly a work in progress implementation, with design trade-offs that are very much in need of fairly high level discussion. -- Peter Geoghegan
Attachment
- 0001-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patch.gz
- 0005-Internal-documentation-for-INSERT-.-ON-CONFLICT-UPDA.patch.gz
- 0004-Tests-for-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patch.gz
- 0003-CONFLICTING-expressions-within-ON-CONFLICT-UPDATE.patch.gz
- 0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patch.gz
On 28 August 2014 03:43, Peter Geoghegan <pg@heroku.com> wrote: > The patch currently lacks a way of referencing datums rejected for > insertion when updating. The way MySQL handles the issue seems > questionable. They allow you to do something like this: > > INSERT INTO upsert (key, val) VALUES (1 'val') ON DUPLICATE KEY UPDATE > val = VALUES(val); > > The implication is that the updated value comes from the INSERT's > VALUES() list, but emulating that seems like a bad idea. In general, > at least with Postgres it's entirely possible that values rejected > differ from the values appearing in the VALUES() list, due to the > effects of before triggers. I'm not sure whether or not we should > assume equivalent transformations during any UPDATE before triggers. > > This is an open item. I think it makes sense to deal with it a bit later. IMHO it is impossible to know if any of the other code is correct until we have a clear and stable vision of what the command is supposed to perform. The inner workings are less important than what the feature does. FWIW, the row available at the end of all BEFORE triggers is clearly the object we should be manipulating, not the original VALUES() clause. Otherwise this type of INSERT would behave differently from normal INSERTs. Which would likely violate RLS, if nothing else. > As I mentioned, I have incorporated feedback from Kevin Grittner. You > may specify a unique index to merge on from within the INSERT > statement, thus avoiding the risk of inadvertently having the update > affect the wrong tuple due to the user failing to consider that there > was a would-be unique violation within some other unique index > constraining some other attribute. You may write the DML statement > like this: > > INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN > upsert_pkey UPDATE SET val = 'update'; > > I think that there is a good chance that at least some people will > want to make this mandatory. I guess that's fair enough, but I > *really* don't want to *mandate* that users specify the name of their > unique index in DML for obvious reasons. Perhaps we can come up with a > more tasteful syntax that covers all interesting cases (consider the > issues with partial unique indexes and before triggers for example, > where a conclusion reached about which index to use during parse > analysis may subsequently be invalidated by user-defined code, or > ambiguous specifications in the face of overlapping attributes between > two unique composite indexes, etc). The Right Thing is far from > obvious, and there is very little to garner from other systems, since > SQL MERGE promises essentially nothing about concurrency, both as > specified by the standard and in practice. You don't need a unique > index at all, and as I showed in my pgCon talk, there are race > conditions even for a trivial UPSERT operations in all major SQL MERGE > implementations. Surely if there are multiple unique indexes then the result row must be validated against all unique indexes before it is allowed at all? The only problem I see is if the newly inserted row matches one row on one unique value and a different row on a different unique index. Turning the INSERT into an UPDATE will still fail on one or other, no matter which index you pick. If there is one row for ALL unique indexes then it is irrelevant which index you pick. So either way, I cannot see a reason to specify an index. If we do need such a construct, we have already the concept of an IDENTITY for a table, added in 9.4, currently targeted at replication. Listing indexes or columns in the DML statement is more pushups for developers and ORMs, so lets KISS. The way forwards, in my view, is to define precisely the behaviour we wish to have. That definition will include the best current mechanism for running an UPSERT using INSERT/UPDATE/loops and comparing that against what is being provided here. We will then have a functional test of equivalence of the approaches, and a basis for making a performance test that shows that performance is increased without any loss of concurrency. Once we have that, we can then be certain our time spent on internals is not wasted by overlooking a simple userland gotcha. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 25, 2014 at 10:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > IMHO it is impossible to know if any of the other code is correct > until we have a clear and stable vision of what the command is > supposed to perform. +1. > The inner workings are less important than what the feature does. +1. > FWIW, the row available at the end of all BEFORE triggers is clearly > the object we should be manipulating, not the original VALUES() > clause. Otherwise this type of INSERT would behave differently from > normal INSERTs. Which would likely violate RLS, if nothing else. +1. > Surely if there are multiple unique indexes then the result row must > be validated against all unique indexes before it is allowed at all? > > The only problem I see is if the newly inserted row matches one row on > one unique value and a different row on a different unique index. > Turning the INSERT into an UPDATE will still fail on one or other, no > matter which index you pick. If there is one row for ALL unique > indexes then it is irrelevant which index you pick. So either way, I > cannot see a reason to specify an index. Failure could be the right thing in some cases. For example, imagine that a user has a table containing names, email addresses, and (with apologies for the American-ism, but I don't know what would be comparable elsewhere) social security numbers. The user has unique indexes on both email addresses and SSNs. If a new record arrives for the same email address, they want to replace the existing record; but a new record arrives with the same SSN, they want the transaction to fail. Otherwise, a newly-arrived record might overwrite the email address of an existing record, which they never want to do, because they view email address as the primary key. I think this kind of scenario will actually be pretty common. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 25 September 2014 15:35, Robert Haas <robertmhaas@gmail.com> wrote: >> The only problem I see is if the newly inserted row matches one row on >> one unique value and a different row on a different unique index. >> Turning the INSERT into an UPDATE will still fail on one or other, no >> matter which index you pick. If there is one row for ALL unique >> indexes then it is irrelevant which index you pick. So either way, I >> cannot see a reason to specify an index. > > Failure could be the right thing in some cases. For example, imagine > that a user has a table containing names, email addresses, and (with > apologies for the American-ism, but I don't know what would be > comparable elsewhere) social security numbers. The user has unique > indexes on both email addresses and SSNs. If a new record arrives for > the same email address, they want to replace the existing record; but > a new record arrives with the same SSN, they want the transaction to > fail. Otherwise, a newly-arrived record might overwrite the email > address of an existing record, which they never want to do, because > they view email address as the primary key. I agree with your example, but not your conclusion. If a new record arrives with a new email address that matches an existing record it will fail. There is a case that would be allowed, which would be a record that creates an entirely new email address. So you do have a point to argue from. However, IMV enforcing such a restriction should be done with an After trigger, which is already possible, not by complicating a DML statement with information it shouldn't need to know, or that might change in the future. Let's keep this new feature as simple as possible. ORMs everywhere need to be encouraged to implement this and they won't do it unless it is bone simple to use. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 25, 2014 at 11:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 25 September 2014 15:35, Robert Haas <robertmhaas@gmail.com> wrote: >>> The only problem I see is if the newly inserted row matches one row on >>> one unique value and a different row on a different unique index. >>> Turning the INSERT into an UPDATE will still fail on one or other, no >>> matter which index you pick. If there is one row for ALL unique >>> indexes then it is irrelevant which index you pick. So either way, I >>> cannot see a reason to specify an index. >> >> Failure could be the right thing in some cases. For example, imagine >> that a user has a table containing names, email addresses, and (with >> apologies for the American-ism, but I don't know what would be >> comparable elsewhere) social security numbers. The user has unique >> indexes on both email addresses and SSNs. If a new record arrives for >> the same email address, they want to replace the existing record; but >> a new record arrives with the same SSN, they want the transaction to >> fail. Otherwise, a newly-arrived record might overwrite the email >> address of an existing record, which they never want to do, because >> they view email address as the primary key. > > I agree with your example, but not your conclusion. > > If a new record arrives with a new email address that matches an > existing record it will fail. There is a case that would be allowed, > which would be a record that creates an entirely new email address. So > you do have a point to argue from. > > However, IMV enforcing such a restriction should be done with an After > trigger, which is already possible, not by complicating a DML > statement with information it shouldn't need to know, or that might > change in the future. I've never been a fan of putting the index name in there. I agree that's stuff that a DML statement shouldn't need to know about. What I've advocated for in the past is specifying the list of columns that should be used to determine whether to insert or update. If you have a match on those columns, update the row; else insert. Any other unique indexes stand or fall as may be. I still think that idea has merit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 28 August 2014 03:43, Peter Geoghegan <pg@heroku.com> wrote: > "Value locking" > =========== > > To date, on-list discussion around UPSERT has almost exclusively > concerned what I've called "value locking"; the idea of locking values > in unique indexes in the abstract (to establish the right to insert > ahead of time). There was some useful discussion on this question > between myself and Heikki back around December/January. Ultimately, we > were unable to reach agreement on an approach and discussion tapered > off. However, Heikki did understand the concerns that informed by > design. He recognized the need to be able to easily *release* value > locks, so as to avoid "unprincipled deadlocks", where under high > concurrency there are deadlocks between sessions that only UPSERT a > single row at a time. I'm not sure how widely appreciated this point > is, but I believe that Heikki appreciates it. It is a very important > point in my opinion. I don't want an implementation that is in any way > inferior to the "UPSERT looping subxact" pattern does (i.e. the plpsql > thing that the docs suggest). > > When we left off, Heikki continued to favor an approach that involved > speculatively inserting heap tuples, and then deleting them in the > event of a conflict. This design was made more complicated when the > need to *release* value locks became apparent (Heikki ended up making > some changes to HeapTupleSatisfiesDirty(), as well as sketching a > design for what you might call a "super delete", where xmin can be set > to InvalidTransactionId for speculatively-inserted heap tuples). After > all, it wasn't as if we could abort a subxact to release locks, which > is what the "UPSERT looping subxact" pattern does. I think it's fair > to say that that design became more complicated than initially > anticipated [4] [5]. > > Anyway, the greater point here is that fundamentally, AFAICT Heikki > and I were in agreement. Once you buy into the idea that we must avoid > holding on to "value locks" of whatever form - as Heikki evidently did > - then exactly what form they take is ultimately only a detail. > Granted, it's a very important detail, but a detail nonetheless. It > can be discussed entirely independently of all of this new stuff, and > thank goodness for that. > > If anyone finds my (virtually unchanged) page heavyweight lock based > value locking approach objectionable, I ask that the criticism be > framed in a way that makes a sharp distinction between each of the > following: > > 1. You don't accept that value locks must be easily released in the > event of a conflict. Is anyone in this camp? It's far from obvious to > me what side of this question Andres is on at this stage, for example. > Robert might have something to say here too. > > 2. Having taken into account the experience of myself and Heikki, and > all that is implied by taking that approach ***while avoiding > unprincipled deadlocks***, you continue to believe that an approach > based on speculative heap insertion, or some alternative scheme is > better than what I have done to the nbtree code here, or you otherwise > dislike something about the proposed value locking scheme. You accept > that value locks must be released and released easily in the event of > a conflict, but like Heikki you just don't like what I've done to get > there. > > Since we can (I believe) talk about the value locking aspect and the > rest of the patch independently, we should do so...unless you're in > camp 1, in which case I guess that we'll have to thrash it out. I'm trying to understand and help out with pushing this patch forwards to completion. Basically, I have absolutely no idea whether I object to or agree with 1) and don't know where to look to find out. We need a clear exposition of design and the alternatives. My approach would be to insert an index tuple for that value into the index, but with the leaf ituple marked with an xid rather than a ctid. If someone tries to insert into the index they would see this and wait for the inserting transaction to end. The inserting transaction would then resolve what happens in the heap (insert/update) and later repoint the index tuple to the inserted/updated row version. I don't see the need for page level locking since it would definitely result in deadlocks (e.g. SQLServer). -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 25, 2014 at 11:17 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Basically, I have absolutely no idea whether I object to or agree with > 1) and don't know where to look to find out. We need a clear > exposition of design and the alternatives. > > My approach would be to insert an index tuple for that value into the > index, but with the leaf ituple marked with an xid rather than a ctid. > If someone tries to insert into the index they would see this and wait > for the inserting transaction to end. The inserting transaction would > then resolve what happens in the heap (insert/update) and later > repoint the index tuple to the inserted/updated row version. I don't > see the need for page level locking since it would definitely result > in deadlocks (e.g. SQLServer). The page level locks are only used to prevent concurrent insertion for as long as it takes to get consensus to proceed among unique indexes, and to actually insert a heap tuple. They're all released before we lock the tuple for update, should we take that path (yes, really). This is consistent with the behavior of other systems, I think. That's my whole reason for preferring to do things that way. If you have a "promise tuples" approach - be it what you outline here, or what Heikki prototyped with heap tuple insertion, or any other - then you need a way to *release* those "value locks" in the event of a conflict/needing to update, before locking/updating. Otherwise, you get deadlocks. This is an issue I highlighted when it came up with Heikki's prototype. AFAICT, any scheme for "value locking" needs to strongly consider the need to *release* value locks inexpensively. Whatever else they do, they cannot persist for the duration of the transaction IMV. Does that make sense? If not, my next suggestion is applying an earlier revision of Heikki's prototype, and seeing for yourself how it can be made to deadlock in an unprincipled/impossible to prevent way [1]. You've quite rightly highlighted the existing subxact looping pattern as something that this needs to be better than in every way. This is one important way in which we might fail to live up to that standard. [1] http://www.postgresql.org/message-id/52B4AAF0.5090806@vmware.com -- Peter Geoghegan
On Thu, Sep 25, 2014 at 7:35 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Sep 25, 2014 at 10:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> IMHO it is impossible to know if any of the other code is correct >> until we have a clear and stable vision of what the command is >> supposed to perform. > > +1. > >> The inner workings are less important than what the feature does. > > +1. > >> FWIW, the row available at the end of all BEFORE triggers is clearly >> the object we should be manipulating, not the original VALUES() >> clause. Otherwise this type of INSERT would behave differently from >> normal INSERTs. Which would likely violate RLS, if nothing else. > > +1. I agree with all of this. I'm glad that my opinion on how a CONFLICTING() expression interacts with BEFORE triggers is accepted, too. -- Peter Geoghegan
On Thu, Sep 25, 2014 at 9:20 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I've never been a fan of putting the index name in there. Me neither. Although I do understand Kevin's concern about the user's intent surrounding which unique index to merge on. > I agree > that's stuff that a DML statement shouldn't need to know about. What > I've advocated for in the past is specifying the list of columns that > should be used to determine whether to insert or update. If you have > a match on those columns, update the row; else insert. Any other > unique indexes stand or fall as may be. > > I still think that idea has merit. As I've said, my problem with that idea is the corner cases. Consider the possible ambiguity. Could DML queries in production start failing ("ambiguous unique index specification") because the DBA created a new unique index on attributes that somewhat overlap the attributes of the unique index that the DML author actually meant? What about the effects of BEFORE triggers, and their interaction with partial unique indexes? If you can describe an exact behavior that overcomes these issues, then I'll give serious consideration to implementing it. As things stand, to be perfectly honest it sounds like a footgun to me. There are interactions that make getting it right very ticklish. I don't want to make a unique index specification mandatory because that's ugly - that's the only reason, TBH. However, while what you describe here accomplishes the same thing without being ugly, it is potentially very surprising. Naming the unique index directly has the great advantage of very clearly demonstrating user intent. -- Peter Geoghegan
On Thu, Sep 25, 2014 at 2:17 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> 1. You don't accept that value locks must be easily released in the >> event of a conflict. Is anyone in this camp? It's far from obvious to >> me what side of this question Andres is on at this stage, for example. >> Robert might have something to say here too. >> >> 2. Having taken into account the experience of myself and Heikki, and >> all that is implied by taking that approach ***while avoiding >> unprincipled deadlocks***, you continue to believe that an approach >> based on speculative heap insertion, or some alternative scheme is >> better than what I have done to the nbtree code here, or you otherwise >> dislike something about the proposed value locking scheme. You accept >> that value locks must be released and released easily in the event of >> a conflict, but like Heikki you just don't like what I've done to get >> there. >> >> Since we can (I believe) talk about the value locking aspect and the >> rest of the patch independently, we should do so...unless you're in >> camp 1, in which case I guess that we'll have to thrash it out. > > I'm trying to understand and help out with pushing this patch forwards > to completion. > > Basically, I have absolutely no idea whether I object to or agree with > 1) and don't know where to look to find out. We need a clear > exposition of design and the alternatives. I laughed when I read this, because I think a lot of the discussion on this topic has been unnecessarily muddled by jargon. > My approach would be to insert an index tuple for that value into the > index, but with the leaf ituple marked with an xid rather than a ctid. > If someone tries to insert into the index they would see this and wait > for the inserting transaction to end. The inserting transaction would > then resolve what happens in the heap (insert/update) and later > repoint the index tuple to the inserted/updated row version. I don't > see the need for page level locking since it would definitely result > in deadlocks (e.g. SQLServer). I think that something like this might work, but the devil is in the details. Suppose two people try to upsert into the same table at the same time. There's one index. If the transactions search that index for conflicts first, neither sees any conflicting tuples, and both proceed. That's no good. OK, so suppose each transaction inserts the special index tuple which you mention, to lock out concurrent inserts of that value, and then searches for already-existing conflicts. Each sees the other's tuple, and they deadlock. That's no good, either. Also, I think there are other cases where we think we're going to insert, so we put the special index tuple in there, but then we decide to update, so we don't need the promise tuple any more, but other sessions are potentially still waiting for our XID to terminate even though there's no conflict any more. I'm having a hard time bringing the details of those cases to mind ATM, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Sep 25, 2014 at 12:11 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I think that something like this might work, but the devil is in the > details. Suppose two people try to upsert into the same table at the > same time. There's one index. If the transactions search that index > for conflicts first, neither sees any conflicting tuples, and both > proceed. That's no good. OK, so suppose each transaction inserts the > special index tuple which you mention, to lock out concurrent inserts > of that value, and then searches for already-existing conflicts. Each > sees the other's tuple, and they deadlock. That's no good, either. I'm very glad that you share my concern about deadlocks like this. > Also, I think there are other cases where we think we're going to > insert, so we put the special index tuple in there, but then we decide > to update, so we don't need the promise tuple any more, but other > sessions are potentially still waiting for our XID to terminate even > though there's no conflict any more. I'm having a hard time bringing > the details of those cases to mind ATM, though. Well, you might have a promise tuple in a unique index on attributes not appearing in the UPDATE's targetlist, for one. You have the other session waiting (doesn't have to be an upserter) just because we *thought about* inserting a value as part of an upsert. That's pretty bad. -- Peter Geoghegan
On Thu, Sep 25, 2014 at 7:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > The way forwards, in my view, is to define precisely the behaviour we > wish to have. That definition will include the best current mechanism > for running an UPSERT using INSERT/UPDATE/loops and comparing that > against what is being provided here. We will then have a functional > test of equivalence of the approaches, and a basis for making a > performance test that shows that performance is increased without any > loss of concurrency. That sounds very reasonable. While I'm sure that what I have here can decisively beat the xact looping pattern in terms of performance as measured by pgbench, the real performance advantage is that this approach doesn't burn through XIDs. That was a concern that Andres highlighted in relation to using the subxact looping pattern with BDR's multi-master replication conflict resolution. -- Peter Geoghegan
On 25 September 2014 20:11, Robert Haas <robertmhaas@gmail.com> wrote: >> My approach would be to insert an index tuple for that value into the >> index, but with the leaf ituple marked with an xid rather than a ctid. >> If someone tries to insert into the index they would see this and wait >> for the inserting transaction to end. The inserting transaction would >> then resolve what happens in the heap (insert/update) and later >> repoint the index tuple to the inserted/updated row version. I don't >> see the need for page level locking since it would definitely result >> in deadlocks (e.g. SQLServer). > > I think that something like this might work, but the devil is in the > details. Suppose two people try to upsert into the same table at the > same time. There's one index. If the transactions search that index > for conflicts first, neither sees any conflicting tuples, and both > proceed. That's no good. OK, so suppose each transaction inserts the > special index tuple which you mention, to lock out concurrent inserts > of that value, and then searches for already-existing conflicts. Each > sees the other's tuple, and they deadlock. That's no good, either. The test index is unique, so our to-be-inserted value exists on only one page, hence page locking applies while we insert it. The next person to insert waits for the page lock and then sees the test tuple. The page lock lasts only for the duration of the insertion of the ituple, not for the whole operation. > Also, I think there are other cases where we think we're going to > insert, so we put the special index tuple in there, but then we decide > to update, so we don't need the promise tuple any more, but other > sessions are potentially still waiting for our XID to terminate even > though there's no conflict any more. I'm having a hard time bringing > the details of those cases to mind ATM, though. We make the decision to INSERT or UPDATE based upon what we find in the test index. If a value if there already, we assume its an UPDATE and go to update the row this points to. If it has been deleted we loop back and try again/error. If the value is not present, we insert the test tuple and progress as an INSERT, then loop back later to set the ctid. There is no case of "don't need promise id anymore". We would use the PK, identity or first unique index as the test index. There is a case where an UPSERT conflicts with an INSERT causing the latter to abort. Anyway, this is why we need the design more clearly exposed, so you can tell me I'm wrong by showing me the URL of it done right. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 25 September 2014 19:59, Peter Geoghegan <pg@heroku.com> wrote: > On Thu, Sep 25, 2014 at 9:20 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> I've never been a fan of putting the index name in there. > > Me neither. Although I do understand Kevin's concern about the user's > intent surrounding which unique index to merge on. The use case cited is real. My solution of using an after trigger works yet without adding specific functionality to this command. So we can achieve what users want without complicating things here. If we do decide we really want it, lets add it as a later patch. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 25 September 2014 20:38, Peter Geoghegan <pg@heroku.com> wrote: > On Thu, Sep 25, 2014 at 7:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> The way forwards, in my view, is to define precisely the behaviour we >> wish to have. That definition will include the best current mechanism >> for running an UPSERT using INSERT/UPDATE/loops and comparing that >> against what is being provided here. We will then have a functional >> test of equivalence of the approaches, and a basis for making a >> performance test that shows that performance is increased without any >> loss of concurrency. > > That sounds very reasonable. So I promise not to discuss locking until we get the first things done. My suggested approach to get this committed is... A. UPDATE/INSERT privilege infrastructure. Add tests to it, make it separately committable, so we can get that done. Submit to Oct CF; get that done early. B. Agree command semantics by producing these things * Explanatory documentation (Ch6.4 Data Manipulation - Upsert) * SQL Reference Documentation (INSERT) * Test cases for feature * Test cases for concurrency * Test cases for pgbench All of the above, as a separate committable patch. I hate the fact that you have written no user facing documentation for this feature. How can anyone tell whether the tests you've written are correct or even consistent to a particular definition of correctness? Submit as patch for review only to Oct 15 CF We then agree what is required for further work At this stage, poll the Django and Rails communities for acceptance and early warning of these features. Listen. C. Internal weirdness Submit C based upon earlier agreed B, submit to Dec 15 CF, major patch deadline, so we can fine tune for last CF. Then Heikki rewrites half your patch in a better way, you thank him and then we commit. All done. > While I'm sure that what I have here can > decisively beat the xact looping pattern in terms of performance as > measured by pgbench, the real performance advantage is that this > approach doesn't burn through XIDs. That was a concern that Andres > highlighted in relation to using the subxact looping pattern with > BDR's multi-master replication conflict resolution. But we're still discussing SQL semantics. So first things first, then loop back around, hoping our design has not been concurrently deleted... -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 25, 2014 at 1:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > The test index is unique, so our to-be-inserted value exists on only > one page, hence page locking applies while we insert it. The next > person to insert waits for the page lock and then sees the test tuple. > > The page lock lasts only for the duration of the insertion of the > ituple, not for the whole operation. (by page lock, I take it you mean buffer lock - converting that into a page hwlock is what I do). This is where it gets quite complicated. What happens if row locking on upsert finds a conflict update changing uniquely-constrained attributes? Sure, a vanilla non-HOT update will fail on inserting a unique index tuple, but *it* can still cause us a row-level conflict, and *it* can only fail (with a dup violation) when we commit/abort. But now we're obligated to wait on it to get the row lock, and it's obligated to wait on us to get the promise tuple lock, or any other sort of "value lock" that hasn't already been released when we go to row lock. Deadlock. You cannot get away with failing to release the promise tuple/value lock if you want to maintain useful guarantees. It doesn't need to be a vanilla non-HOT update. That's just the simplest example I can think of. >> Also, I think there are other cases where we think we're going to >> insert, so we put the special index tuple in there, but then we decide >> to update, so we don't need the promise tuple any more, but other >> sessions are potentially still waiting for our XID to terminate even >> though there's no conflict any more. I'm having a hard time bringing >> the details of those cases to mind ATM, though. > > We make the decision to INSERT or UPDATE based upon what we find in > the test index. If a value if there already, we assume its an UPDATE > and go to update the row this points to. If it has been deleted we > loop back and try again/error. Sure, you can throw an error, and that makes things a lot easier. It also implies that the implementation is inferior to the subxact looping pattern, which you've already implied is a thing we must beat in every way. Frankly, I think it's a cop-out to just throw an error, and I don't think it'll end up being some theoretical risk. It'll happen often if it is allowed to happen at all. Allowing it to happen almost defeats the purpose of the feature - the big appeal of the feature is that it makes guarantees about the outcome. -- Peter Geoghegan
On Thu, Sep 25, 2014 at 1:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > A. UPDATE/INSERT privilege infrastructure. > Add tests to it, make it separately committable, so we can get that done. > Submit to Oct CF; get that done early. Makes sense. As long as we assume that we want a unified syntax like this - that is, that we need something vaguely insert-update or update-insertish - then we need this. Unfortunately, we cannot add regression tests for this without almost the full patch set. > B. Agree command semantics by producing these things > * Explanatory documentation (Ch6.4 Data Manipulation - Upsert) > * SQL Reference Documentation (INSERT) > * Test cases for feature > * Test cases for concurrency > * Test cases for pgbench Okay. I do have stress-tests, that are separately maintained, in case you missed that: https://github.com/petergeoghegan/upsert > All of the above, as a separate committable patch. I hate the fact > that you have written no user facing documentation for this feature. > How can anyone tell whether the tests you've written are correct or > even consistent to a particular definition of correctness? I'd hoped that the commit messages, and my discussion of the feature were adequate. > Submit as patch for review only to Oct 15 CF > We then agree what is required for further work > At this stage, poll the Django and Rails communities for acceptance > and early warning of these features. Listen. I know an original founder of the Django project quite well - Jacob Kaplan-Moss (a co-worker - the guy that keynoted pgOpen in its second year). He is very interested in this effort. > C. Internal weirdness > Submit C based upon earlier agreed B, submit to Dec 15 CF, major patch > deadline, so we can fine tune for last CF. > Then Heikki rewrites half your patch in a better way, you thank him > and then we commit. All done. I don't have a problem with Heikki or anyone else rewriting the value locking part of the patch, provided it meets my requirements for such a mechanism. Since Heikki already agreed that that standard should be imposed, he'd hardly take issue with it now. However, the fact is that once you actually make something like promise tuples meet that standard, at the very least it becomes a lot messier than you'd think. Heikki's final prototype "super deleted" tuples by setting their xmin to InvalidTransactionId. We weren't sure that that doesn't break some random other heapam code. Consider this, for example: https://github.com/postgres/postgres/blob/REL9_4_STABLE/src/backend/executor/execMain.c#L1961 So that looks safe in the face of setting xmin to InvalidTransactionId in the way the later prototype patch did if you think about it for a while, but there are other places where that is less clear. In short, it becomes something that we have to worry about for ever, because "xmin cannot change without the tuple in the slot changing" is clearly an invariant for certain purposes. It might accidentally fail to fail right now, but I'm not comfortable with it. Now, I might be convinced that that's actually the way to go. I have an open mind. But that will take discussion. I like that page hwlocking is something that many systems do (even including Oracle, I believe). Making big changes to nbtree is always something that deserves to be met with skepticism, but it is nice to have an implementation that lives in the head of AM. Sorry, I forgot to not talk about locking. > But we're still discussing SQL semantics. So first things first, then > loop back around, hoping our design has not been concurrently > deleted... I hope the discussion can avoid "unprincipled deadlocks".... -- Peter Geoghegan
On 25 September 2014 22:13, Peter Geoghegan <pg@heroku.com> wrote: >> All of the above, as a separate committable patch. I hate the fact >> that you have written no user facing documentation for this feature. >> How can anyone tell whether the tests you've written are correct or >> even consistent to a particular definition of correctness? > > I'd hoped that the commit messages, and my discussion of the feature > were adequate. I'd hoped that my discussion was sufficient to persuade you too, but it wasn't. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 25, 2014 at 2:37 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I'd hoped that the commit messages, and my discussion of the feature >> were adequate. > > I'd hoped that my discussion was sufficient to persuade you too, but it wasn't. I'll write user-visible docs soon, then. -- Peter Geoghegan
On 26/09/14 08:21, Simon Riggs wrote: > On 25 September 2014 20:11, Robert Haas <robertmhaas@gmail.com> wrote: > >>> My approach would be to insert an index tuple for that value into the >>> index, but with the leaf ituple marked with an xid rather than a ctid. >>> If someone tries to insert into the index they would see this and wait >>> for the inserting transaction to end. The inserting transaction would >>> then resolve what happens in the heap (insert/update) and later >>> repoint the index tuple to the inserted/updated row version. I don't >>> see the need for page level locking since it would definitely result >>> in deadlocks (e.g. SQLServer). >> I think that something like this might work, but the devil is in the >> details. Suppose two people try to upsert into the same table at the >> same time. There's one index. If the transactions search that index >> for conflicts first, neither sees any conflicting tuples, and both >> proceed. That's no good. OK, so suppose each transaction inserts the >> special index tuple which you mention, to lock out concurrent inserts >> of that value, and then searches for already-existing conflicts. Each >> sees the other's tuple, and they deadlock. That's no good, either. > The test index is unique, so our to-be-inserted value exists on only > one page, hence page locking applies while we insert it. The next > person to insert waits for the page lock and then sees the test tuple. > > The page lock lasts only for the duration of the insertion of the > ituple, not for the whole operation. > >> Also, I think there are other cases where we think we're going to >> insert, so we put the special index tuple in there, but then we decide >> to update, so we don't need the promise tuple any more, but other >> sessions are potentially still waiting for our XID to terminate even >> though there's no conflict any more. I'm having a hard time bringing >> the details of those cases to mind ATM, though. > We make the decision to INSERT or UPDATE based upon what we find in > the test index. If a value if there already, we assume its an UPDATE > and go to update the row this points to. If it has been deleted we > loop back and try again/error. If the value is not present, we insert > the test tuple and progress as an INSERT, then loop back later to set > the ctid. There is no case of "don't need promise id anymore". We > would use the PK, identity or first unique index as the test index. > There is a case where an UPSERT conflicts with an INSERT causing the > latter to abort. > > Anyway, this is why we need the design more clearly exposed, so you > can tell me I'm wrong by showing me the URL of it done right. > What happens if the new value(s) of the INERT/UPDATE require the page to be split? I assume the mechanics of this are catered for, but how does it affect locking & potential deadlocks? Cheers, Gavin
On 09/26/2014 12:13 AM, Peter Geoghegan wrote: > On Thu, Sep 25, 2014 at 1:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> C. Internal weirdness >> Submit C based upon earlier agreed B, submit to Dec 15 CF, major patch >> deadline, so we can fine tune for last CF. >> Then Heikki rewrites half your patch in a better way, you thank him >> and then we commit. All done. > > I don't have a problem with Heikki or anyone else rewriting the value > locking part of the patch, provided it meets my requirements for such > a mechanism. Since Heikki already agreed that that standard should be > imposed, he'd hardly take issue with it now. > > However, the fact is that once you actually make something like > promise tuples meet that standard, at the very least it becomes a lot > messier than you'd think. Heikki's final prototype "super deleted" > tuples by setting their xmin to InvalidTransactionId. We weren't sure > that that doesn't break some random other heapam code. Consider this, > for example: > > https://github.com/postgres/postgres/blob/REL9_4_STABLE/src/backend/executor/execMain.c#L1961 > > So that looks safe in the face of setting xmin to InvalidTransactionId > in the way the later prototype patch did if you think about it for a > while, but there are other places where that is less clear. In short, > it becomes something that we have to worry about for ever, because > "xmin cannot change without the tuple in the slot changing" is clearly > an invariant for certain purposes. It might accidentally fail to fail > right now, but I'm not comfortable with it. Just to be clear: I wrote the initial patch to demonstrate what I had in mind, because I was not able to explain it well enough otherwise. You pointed out issues with it, which I then fixed. You then pointed out more issues, which I then fixed again. My patch version was a proof of concept, to demonstrate that it can be done. What I'd like you to do now, as the patch author, is to take the promise tuple approach and clean it up. If the xmin stuff is ugly, figure out some other way to do it. > Now, I might be convinced that that's actually the way to go. I have > an open mind. But that will take discussion. I like that page > hwlocking is something that many systems do (even including Oracle, I > believe). Making big changes to nbtree is always something that > deserves to be met with skepticism, but it is nice to have an > implementation that lives in the head of AM. I don't know what you mean by "in the head of AM", but IMO it would be far better if we can implement this outside the index AMs. Then it will work with any index AM. BTW, in the discussions, you pointed out that exclusion constraints currently behave differently from a unique index, when two backends insert a tuple at the same time. With a unique index, one of them will fail, but one is always guaranteed to succeed. With an exclusion constraint, they can both fail if you're unlucky. I think the promise tuples would allow us to fix that, too, while we're at it. In fact, you might consider tackling that first, and build the new INSERT ON CONFLICT syntax on top of that. Basically, an INSERT to a table with an exclusion constraint would be the same as "INSERT ON CONFLICT throw an error". That would be a useful way to split this patch into two parts. - Heikki
On 2014-09-26 15:24:21 +0300, Heikki Linnakangas wrote: > I don't know what you mean by "in the head of AM", but IMO it would be far > better if we can implement this outside the index AMs. Then it will work > with any index AM. Also, it's the only chance to make this ever work across partitions. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/26/2014 03:30 PM, Andres Freund wrote: > On 2014-09-26 15:24:21 +0300, Heikki Linnakangas wrote: >> I don't know what you mean by "in the head of AM", but IMO it would be far >> better if we can implement this outside the index AMs. Then it will work >> with any index AM. > > Also, it's the only chance to make this ever work across partitions. How so? Assuming there's no overlap in the partitions, you could lock the page in the index of the partition you're inserting to, just like you would insert the promise tuple to the right partition. - Heikki
On 2014-09-26 15:32:35 +0300, Heikki Linnakangas wrote: > On 09/26/2014 03:30 PM, Andres Freund wrote: > >On 2014-09-26 15:24:21 +0300, Heikki Linnakangas wrote: > >>I don't know what you mean by "in the head of AM", but IMO it would be far > >>better if we can implement this outside the index AMs. Then it will work > >>with any index AM. > > > >Also, it's the only chance to make this ever work across partitions. > > How so? Assuming there's no overlap in the partitions, you could lock the > page in the index of the partition you're inserting to, just like you would > insert the promise tuple to the right partition. Well, the 'no overlap' case is boring. At least if you mean that each partition has disctinct value ranges in the index? And the reason that the buffer locking approach in the overlapping case is that you'd need to hold a large number of pages locked at the same time. Right? But primarily I mean that bulk of the uniqueness checking logic has to live outside the individual AMs. It doesn't sound enticing to reach from inside one AM into another partitions index to do stuff. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/26/2014 03:40 PM, Andres Freund wrote: > On 2014-09-26 15:32:35 +0300, Heikki Linnakangas wrote: >> On 09/26/2014 03:30 PM, Andres Freund wrote: >>> On 2014-09-26 15:24:21 +0300, Heikki Linnakangas wrote: >>>> I don't know what you mean by "in the head of AM", but IMO it would be far >>>> better if we can implement this outside the index AMs. Then it will work >>>> with any index AM. >>> >>> Also, it's the only chance to make this ever work across partitions. >> >> How so? Assuming there's no overlap in the partitions, you could lock the >> page in the index of the partition you're inserting to, just like you would >> insert the promise tuple to the right partition. > > Well, the 'no overlap' case is boring. Ok. > At least if you mean that each partition has disctinct value ranges > in the index? Right. > And the reason that the buffer locking approach in the overlapping case > is that you'd need to hold a large number of pages locked at the same > time. Right? Yeah, you would. To be honest, I didn't even think about the overlapping case, I just assumed that the overlapping case is the typical one and only thought about that. > But primarily I mean that bulk of the uniqueness checking logic has to > live outside the individual AMs. It doesn't sound enticing to reach from > inside one AM into another partitions index to do stuff. Yeah, that's a non-starter. Even with the index locking stuff, though, it wouldn't be the AM's responsibility to reach out to other partitions. - Heikki
On 2014-09-26 15:58:17 +0300, Heikki Linnakangas wrote: > On 09/26/2014 03:40 PM, Andres Freund wrote: > >And the reason that the buffer locking approach in the overlapping case > >is that you'd need to hold a large number of pages locked at the same > >time. Right? > > Yeah, you would. To be honest, I didn't even think about the overlapping > case, I just assumed that the overlapping case is the typical one and only > thought about that. I think it's actually quite common to want to have uniqueness constraint overlapping partitions. Consider e.g. partitioning on the username. You might still want to ensure emails are unique. > >But primarily I mean that bulk of the uniqueness checking logic has to > >live outside the individual AMs. It doesn't sound enticing to reach from > >inside one AM into another partitions index to do stuff. > > Yeah, that's a non-starter. Even with the index locking stuff, though, it > wouldn't be the AM's responsibility to reach out to other partitions. I'm thinking of the way btree currently does uniqueness checks. Unless you move a large chunk of that out of the AM you'll have a hard time building anything crossing partitions based on it. At least I can't see how. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Sep 26, 2014 at 5:24 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > Just to be clear: I wrote the initial patch to demonstrate what I had in > mind, because I was not able to explain it well enough otherwise. You > pointed out issues with it, which I then fixed. You then pointed out more > issues, which I then fixed again. > My patch version was a proof of concept, to demonstrate that it can be done. Right. It was a rough prototype built to prove a point. It also served to show what I was talking about as regards deadlocks (and how the locks could problematically persist in other ways), which I was previously unable to effectively explain to Andres. So it was a very useful exercise, and I wish we did that kind of thing more frequently. But at the same time, I don't want to hold you to that prototype, or misrepresent that prototype as showing your final position on any technical issue. So please correct me if I do that. I've tried to be careful about that. > What I'd like you to do now, as the patch author, is to take the promise > tuple approach and clean it up. If the xmin stuff is ugly, figure out some > other way to do it. My concern with the xmin stuff is not that it's ugly; it's that it's potentially dangerous. It isn't at all easy to reason about where bugs might appear - lots of things could interact with it in unpredictable ways. I think we'd have to audit a lot of code, all over the place, just to make sure nowhere had an assumption broken. This is a big issue. You are asking me to find a way to save a design that I don't particularly believe in. That might change, but right now I'm afraid that that's the reality. Whereas, my design is entirely contained in the file nbtinsert.c. > I don't know what you mean by "in the head of AM", but IMO it would be far > better if we can implement this outside the index AMs. Then it will work > with any index AM. I mean that "value locking" is an abstraction that lives in the head of amcanunique AMs. That kind of encapsulation has considerable value in reducing the risk of bugs. If what I've done has bugs, there isn't that many places that could expose interactions with other complicated code. There are fewer moving parts. It's a generalization of the existing mechanism for unique index enforcement. Plus, database systems have used heavyweight index locks for this kind of thing since the 1970s. That's how this works everywhere else (SQL server certainly does this for MERGE [1], and only grabs the page-level lock for a second at lower isolation levels, as in my implementation). I think that that ought to count for something. I will be frank. Everyone knows that the nbtree locking parts of this are never going to be committed over your objections. It cannot happen. And yet, I persist in proposing that we go that way. I may be stubborn, but I am not so stubborn that I'd jeopardize all the work I've put into this to save one aspect of it that no one really cares about anyway (even I only care about meeting my goals for user visible behavior [2]). I may actually come up with a better way to make what you outline work; then again, I may not. I have no idea, to be honest. It's pretty clear that I'm going to have a hard time getting your basic approach to value locking accepted without rethinking it a lot, though. Can you really say that you won't have serious misgivings about something like the "tuple->xmin = InvalidTransactionId" swapping, if I actually formally propose it? That's very invasive to a lot of places. And right now, I have no idea how we could do better. I really only want to get to where we have a design that's acceptable. In all sincerity, I may yet be convinced to go your way. It's possible that I've failed to fully understand your concerns. Is it really just about making INSERT ... ON CONFLICT IGNORE work with exclusion constraints (UPDATE clearly makes little sense)? > Basically, an INSERT to a table with an exclusion constraint would be the > same as "INSERT ON CONFLICT throw an error". That would be a useful way to > split this patch into two parts. I'll think about it. I don't want to do that until I see a way to make your approach to value locking work in a way that someone is actually going to be comfortable committing. I am looking for one. By the way, IMO stress testing has a very useful role to play in the development of this feature. I've been doing things like trying to flush out races by running long stress tests with random delays artificially added at key points. I would like to make that part of the testing strategy public and transparent. [1] http://weblogs.sqlteam.com/mladenp/archive/2007/08/03/60277.aspx [2] Slide 8, "Goals for UPSERT in Postgres": http://www.pgcon.org/2014/schedule/attachments/327_upsert_weird.pdf -- Peter Geoghegan
On Fri, Sep 26, 2014 at 5:58 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> And the reason that the buffer locking approach in the overlapping case >> is that you'd need to hold a large number of pages locked at the same >> time. Right? > > > Yeah, you would. To be honest, I didn't even think about the overlapping > case, I just assumed that the overlapping case is the typical one and only > thought about that. I'm not sure that I follow. Unique constraints don't work across partitions today. Why should this work across partitions in the most general case? Simply because there'd have to be one page lock held per unique index/partition, where promise tuples are somewhat like row locks, so presumably only one lock table entry is required? In other database systems with better partitioning support, there is such a thing as indexes that apply across all partitions ("global indexes"). There are also "local indexes", that can only be unique if that comports with the partitioning key in a way that makes sense. But we don't have anything like global indexes, and even in those other systems there are huge caveats around MERGE and its impact on global indexes (they are automatically *marked unusable* by an SQL MERGE command). So I think making what you have in mind here work for current Postgres partitioning is totally unrealistic, unless (at the very least) someone also goes and writes a global index feature, which is obviously an enormous project. -- Peter Geoghegan
Peter Geoghegan wrote: > Can you really say that you won't have serious misgivings > about something like the "tuple->xmin = InvalidTransactionId" > swapping, if I actually formally propose it? That's very invasive to a > lot of places. And right now, I have no idea how we could do better. FWIW there are 28 callers of HeapTupleHeaderGetXmin. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Sep 26, 2014 at 3:11 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > FWIW there are 28 callers of HeapTupleHeaderGetXmin. 31 by my count, though that difference hardly matters. A lot of those callers are in parts of the code that I don't know well. For example, CheckForSerializableConflictOut(). Don't forget about direct callers to HeapTupleHeaderGetRawXmin(), though. There are plenty of those in tqual.c. -- Peter Geoghegan
On Fri, Sep 26, 2014 at 3:25 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Fri, Sep 26, 2014 at 3:11 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> FWIW there are 28 callers of HeapTupleHeaderGetXmin. > Don't forget about direct callers to HeapTupleHeaderGetRawXmin(), > though. There are plenty of those in tqual.c. Which reminds me: commit 37484ad2 added the opportunistic freezing stuff. To quote the commit message: """ Instead of changing the tuple xmin to FrozenTransactionId, the combination of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never set together, is now defined as HEAP_XMIN_FROZEN. A variety of previous proposals to freeze tuples opportunistically before vacuum_freeze_min_age is reached have foundered on the objection that replacing xmin by FrozenTransactionId might hinder debugging efforts when things in this area go awry; this patch is intended to solve that problem by keeping the XID around (but largely ignoring the value to which it is set). """ Why wouldn't the same objection (the objection that the earlier opportunistic freezing ideas stalled on) apply to directly setting tuple xmin to InvalidTransactionId? You get the idea, though: Making promise tuples possible to release early (before transaction end) by setting tuple xmin to InvalidTransactionId is certainly hard to get right, and seems dangerous. -- Peter Geoghegan
On Thu, Sep 25, 2014 at 1:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > I hate the fact > that you have written no user facing documentation for this feature. Attached patch adds a commit to the existing patchset. For the convenience of reviewers, I've uploaded and made publicly accessible a html build of the documentation. This page is of most interest: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html See also: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/transaction-iso.html http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/ddl-inherit.html http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-createrule.html http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-createtrigger.html http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/index-unique-checks.html http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-createview.html http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/postgres-fdw.html -- Peter Geoghegan
Attachment
On Thu, Sep 25, 2014 at 1:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > At this stage, poll the Django and Rails communities for acceptance > and early warning of these features. Listen. FYI, I have asked for input from the Django developers here: https://groups.google.com/forum/#!topic/django-developers/hdzkoLYVjBY -- Peter Geoghegan
On 27 September 2014 23:23, Peter Geoghegan <pg@heroku.com> wrote: > On Thu, Sep 25, 2014 at 1:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I hate the fact >> that you have written no user facing documentation for this feature. > > Attached patch adds a commit to the existing patchset. For the > convenience of reviewers, I've uploaded and made publicly accessible a > html build of the documentation. This page is of most interest: > > http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html My request was for the following... Agree command semantics by producing these things * Explanatory documentation (Ch6.4 Data Manipulation - Upsert) * SQL Reference Documentation (INSERT) * Test cases for feature * Test cases for concurrency * Test cases for pgbench because it forces you to show in detail how the command works. Adding a few paragraphs to the INSERT page with two quick examples is not the same level of detail at all and leaves me with the strong impression my input has been assessed as ON CONFLICT IGNORE. Examples of the following are needed "ON CONFLICT UPDATE optionally accepts a WHERE clause condition. When provided, the statement only procedes with updating if the condition is satisfied. Otherwise, unlike a conventional UPDATE, the row is still locked for update. Note that the condition is evaluated last, after a conflict has been identified as a candidate to update." Question arising: do you need to specify location criteria, or is this an additional filter? When/why would we want that? "Failure to anticipate and prevent would-be unique violations originating in some other unique index than the single unique index that was anticipated as the sole source of would-be uniqueness violations can result in updating a row other than an existing row with conflicting values (if any)." In English, please How would you do "if colA = 3 then ignore else update"? No explanation of why the CONFLICTING() syntax differs from OLD./NEW. syntax used in triggers The page makes no mention of the upsert problem, nor is any previous code mentioned. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Sep 27, 2014 at 11:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > My request was for the following... > > Agree command semantics by producing these things > * Explanatory documentation (Ch6.4 Data Manipulation - Upsert) Do you really think I could get an entire chapter out of this? > * SQL Reference Documentation (INSERT) > * Test cases for feature > * Test cases for concurrency All of these were added. There are two new sets of isolation tests, one per variant of the new clause (IGNORE/UPDATE). > * Test cases for pgbench They're not part of the patch proper, but as I've already mentioned I have pgbench based stress-tests on Github. There is a variety of test-cases that test the feature under high concurrency: https://github.com/petergeoghegan/upsert > Examples of the following are needed > > "ON CONFLICT UPDATE optionally accepts a WHERE clause condition. Yes, I realized I missed an example of that one the second I hit "send". The MVCC interactions of this are discussed within transaction-iso.html, FWIW. > Question arising: do you need to specify location criteria, or is this > an additional filter? When/why would we want that? It is an additional way to specify a predicate/condition to UPDATE on. There might be a kind of redundancy, if you decided to repeat the constrained values in the predicate too, but if you're using the WHERE clause sensibly there shouldn't be. So your UPDATE's "full predicate" is sort of the union of the constrained values that the conflict path was taken for, plus whatever you put in the WHERE clause, but not quite because they're evaluated at different times (as explained within transaction-iso.html). > How would you do "if colA = 3 then ignore else update"? Technically, you can't do that exact thing. IGNORE is just for quickly dealing with ETL-type problems (and it is reasonable to use it without one particular unique index in mind, unlike ON CONFLICT UPDATE) - think pgloader. But if you did this: INSERT INTO tab(colB) values('foo') ON CONFLICT UPDATE set colB = CONFLICTING(colB) WHERE colA != 3 Then you would achieve almost the same thing. You wouldn't have inserted or updated anything if the only rows considered had a colA of 3, but any such rows considered would be locked, which isn't the same as IGNOREing them. > No explanation of why the CONFLICTING() syntax differs from OLD./NEW. > syntax used in triggers Why should it be the same? > The page makes no mention of the upsert problem, nor is any previous > code mentioned. What's the upsert problem? I mean, apart from the fact that we don't have it. Note that it is documented that one of the two outcomes is guaranteed. I should have updated the plpgsql looping subxact example, though. -- Peter Geoghegan
On 09/28/2014 09:40 AM, Peter Geoghegan wrote: >> No explanation of why the CONFLICTING() syntax differs from OLD./NEW. >> syntax used in triggers > > Why should it be the same? Both can be seen as cases where you refer to a field of a tuple, which is usually done with FOO.bar. -- Andreas Karlsson
On 09/28/2014 03:40 PM, Peter Geoghegan wrote: > Do you really think I could get an entire chapter out of this? Yes. It might be a short chapter, but once you extract the existing upsert example from the docs and how why the naïve approach doesn't work there'll be enough to go on. People get this wrong a *lot*. http://www.postgresql.org/docs/current/static/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING http://www.depesz.com/2012/06/10/why-is-upsert-so-complicated/ http://stackoverflow.com/q/17267417/398670 http://stackoverflow.com/q/1109061/398670 I'm happy to help with documenting it. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 28 September 2014 08:40, Peter Geoghegan <pg@heroku.com> wrote: > On Sat, Sep 27, 2014 at 11:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> My request was for the following... >> >> Agree command semantics by producing these things >> * Explanatory documentation (Ch6.4 Data Manipulation - Upsert) ... > INSERT INTO tab(colB) values('foo') ON CONFLICT UPDATE set colB = > CONFLICTING(colB) WHERE colA != 3 > > Then you would achieve almost the same thing. You wouldn't have > inserted or updated anything if the only rows considered had a colA of > 3, but any such rows considered would be locked, which isn't the same > as IGNOREing them. > >> No explanation of why the CONFLICTING() syntax differs from OLD./NEW. >> syntax used in triggers > > Why should it be the same? Good question. What could be wrong with making up new syntax? The obvious answer is because we would simply have nothing to guide us. No principles that can be applied, just opinions. My considered opinion is that the above syntax is * non-standard * inconsistent with what we have elsewhere * an additional item for implementors to handle I could use more emotive words here, but the above should suffice to cover my unease at inventing new SQL constructs. This is Postgres. What worries me the most is that ORM implementors everywhere will simply ignore our efforts, leaving us with something we'd much rather we didn't have. As a possible committer of this feature, I would not wish to put my name to that. You will need one a committer who will do that. Which brings me back to the SQL Standard, which is MERGE. We already know the MERGE command does not fully and usefully define its concurrent behaviour; I raised this 6 years ago. It's not clear to me that that we couldn't more closely define the behaviour for a subset of the command. If we implement MERGE, then we will help ORM developers do less work to support Postgres, which will encourage adoption. My proposal would be to implement only a very limited syntax for MERGE in this release, replacing this > INSERT INTO tab(colB) values('foo') ON CONFLICT UPDATE set colB = > CONFLICTING(colB) WHERE colA != 3 with this... MERGE INTO tab USING VALUES ('foo') WHEN NOT MATCHED THENINSERT (colB) WHEN MATCHED THENUPDATE SET colB = NEW.p1 and throwing "ERROR: full syntax for MERGE not implemented yet" if people stretch too far. If there is some deviation from the standard, it can be explained clearly, though I don't see we would need to do that - we can extend beyond the standard to explain the concurrent behaviour. And we will be a lot closer to getting full MERGE also. Doing MERGE syntax is probably about 2 weeks work, which is better than 2 weeks per ORM to support the new Postgres-only syntax. Thanks for your efforts to bring this to a conclusion. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Sep 28, 2014 at 1:17 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > MERGE INTO tab USING VALUES ('foo') > WHEN NOT MATCHED THEN > INSERT (colB) > WHEN MATCHED THEN > UPDATE SET colB = NEW.p1 > > and throwing "ERROR: full syntax for MERGE not implemented yet" if > people stretch too far. That isn't the MERGE syntax either. Where is the join? I've extensively discussed why I think we should avoid calling something upsert-like MERGE, as you know: http://www.postgresql.org/message-id/flat/CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com#CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com We *should* have a MERGE feature, but one that serves the actual MERGE use-case well. That is an important use-case; it just isn't the one I'm interested in right now. FWIW, I agree that it wouldn't be much work to do this - what you present here really is just a different syntax for what I have here (which isn't MERGE). I think it would be counter-productive to pursue this, though. Also, what about limiting the unique indexes under consideration? There was informal meeting of this at the dev meeting a in 2012. -- Peter Geoghegan
On Sun, Sep 28, 2014 at 1:31 PM, Peter Geoghegan <pg@heroku.com> wrote: > There was informal meeting of this at the dev meeting a in 2012. I mean: There was informal agreement that as long as we're working on a feature that makes useful, UPSERT-like guarantees, we shouldn't use the MERGE syntax. MERGE clearly benefits (in ways only relevant to the use-case it targets) from having the leeway to not care about what someone with the UPSERT use-case would call race conditions. -- Peter Geoghegan
On 29/09/14 09:31, Peter Geoghegan wrote: > On Sun, Sep 28, 2014 at 1:17 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> MERGE INTO tab USING VALUES ('foo') >> WHEN NOT MATCHED THEN >> INSERT (colB) >> WHEN MATCHED THEN >> UPDATE SET colB = NEW.p1 >> >> and throwing "ERROR: full syntax for MERGE not implemented yet" if >> people stretch too far. > That isn't the MERGE syntax either. Where is the join? > > I've extensively discussed why I think we should avoid calling > something upsert-like MERGE, as you know: > http://www.postgresql.org/message-id/flat/CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com#CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com > > We *should* have a MERGE feature, but one that serves the actual MERGE > use-case well. That is an important use-case; it just isn't the one > I'm interested in right now. > > FWIW, I agree that it wouldn't be much work to do this - what you > present here really is just a different syntax for what I have here > (which isn't MERGE). I think it would be counter-productive to pursue > this, though. Also, what about limiting the unique indexes under > consideration? > > There was informal meeting of this at the dev meeting a in 2012. > How about have a stub page for MERGE, saying it is not implemented yet, but how about considering UPSERT - or something of that nature? I can suspect that people are much more likely to look for 'MERGE' in an index, or look for 'MERGE' in the list of SQL commands, than 'UPSERT'. Cheers, Gavin
On Sun, Sep 28, 2014 at 3:41 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote: > How about have a stub page for MERGE, saying it is not implemented yet, but > how about considering UPSERT - or something of that nature? > > I can suspect that people are much more likely to look for 'MERGE' in an > index, or look for 'MERGE' in the list of SQL commands, than 'UPSERT'. Seems reasonable. What I have a problem with is using the MERGE syntax to match people's preexisting confused ideas about what MERGE does. If we do that, it'll definitely bite us when we go to make what we'd be calling MERGE do what MERGE is actually supposed to do. I favor clearly explaining that. -- Peter Geoghegan
On 29/09/14 11:57, Peter Geoghegan wrote: > On Sun, Sep 28, 2014 at 3:41 PM, Gavin Flower > <GavinFlower@archidevsys.co.nz> wrote: >> How about have a stub page for MERGE, saying it is not implemented yet, but >> how about considering UPSERT - or something of that nature? >> >> I can suspect that people are much more likely to look for 'MERGE' in an >> index, or look for 'MERGE' in the list of SQL commands, than 'UPSERT'. > Seems reasonable. > > What I have a problem with is using the MERGE syntax to match people's > preexisting confused ideas about what MERGE does. If we do that, it'll > definitely bite us when we go to make what we'd be calling MERGE do > what MERGE is actually supposed to do. I favor clearly explaining > that. > Opinionated I may be, but I wanted stay well clear of the syntax minefield in this area - as I still have at least a vestigial instinct for self preservation! :-)
On Sun, Sep 28, 2014 at 6:15 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote: >> What I have a problem with is using the MERGE syntax to match people's >> preexisting confused ideas about what MERGE does. If we do that, it'll >> definitely bite us when we go to make what we'd be calling MERGE do >> what MERGE is actually supposed to do. I favor clearly explaining >> that. >> > Opinionated I may be, but I wanted stay well clear of the syntax minefield > in this area - as I still have at least a vestigial instinct for self > preservation! :-) To be clear: I don't think Simon is confused about this at all, which is why I'm surprised that he suggested it. -- Peter Geoghegan
On 29/09/14 14:20, Peter Geoghegan wrote: > On Sun, Sep 28, 2014 at 6:15 PM, Gavin Flower > <GavinFlower@archidevsys.co.nz> wrote: >>> What I have a problem with is using the MERGE syntax to match people's >>> preexisting confused ideas about what MERGE does. If we do that, it'll >>> definitely bite us when we go to make what we'd be calling MERGE do >>> what MERGE is actually supposed to do. I favor clearly explaining >>> that. >>> >> Opinionated I may be, but I wanted stay well clear of the syntax minefield >> in this area - as I still have at least a vestigial instinct for self >> preservation! :-) > To be clear: I don't think Simon is confused about this at all, which > is why I'm surprised that he suggested it. > > More specifically, I have only lightly read this thread - and while I think the functionality is useful, I have not thought about it any real depth. I was thinking more along the lines that if I needed functionality like this, where & how might I look for it. I was remembering my problems looking up syntax in COBOL after coming from FORTRAN (& other languages) - some concepts had different names and the philosophy was significantly different in places. The relevance here, is that peoples' background in other DBMS & knowledge of SQL standards affect what they expect, as well as preventing unnecessary conflicts between PostgreSQL & SQL standards (as far as is practicable & sensible).
On 09/29/2014 06:41 AM, Gavin Flower wrote: > > I can suspect that people are much more likely to look for 'MERGE' in an > index, or look for 'MERGE' in the list of SQL commands, than 'UPSERT'. and/or to be looking for MySQL's: ON DUPLICATE KEY {IGNORE|UPDATE} What astonishes me when I look around at how other RDBMS users solve this is how many of them completely ignore concurrency issues. e.g. in this SO question: http://stackoverflow.com/q/108403/398670 there's an alarming lack of concern for concurrency, just a couple of links to : http://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/ (BTW, that article contains some useful information about corner cases any upsert approach should test and deal with). Similar with Oracle: Alarming lack of concern for concurrency among users: http://stackoverflow.com/q/237327/398670 Useful article: http://michaeljswart.com/2011/09/mythbusting-concurrent-updateinsert-solutions/ -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Sep 28, 2014 at 8:53 PM, Craig Ringer <craig@2ndquadrant.com> wrote: > there's an alarming lack of concern for concurrency, just a couple of > links to : > > http://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/ > > (BTW, that article contains some useful information about corner cases > any upsert approach should test and deal with). Did you find some of those links from my pgCon slides, or independently? I'm well aware of those issues, FWIW. Avoiding repeating the mistakes of others is something that I thought about from an early stage. -- Peter Geoghegan
On 09/29/2014 12:03 PM, Peter Geoghegan wrote: > On Sun, Sep 28, 2014 at 8:53 PM, Craig Ringer <craig@2ndquadrant.com> wrote: >> there's an alarming lack of concern for concurrency, just a couple of >> links to : >> >> http://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/ >> >> (BTW, that article contains some useful information about corner cases >> any upsert approach should test and deal with). > > Did you find some of those links from my pgCon slides, or > independently? I'm well aware of those issues, FWIW. Avoiding > repeating the mistakes of others is something that I thought about > from an early stage. Independently. I'm very glad to see you've looked over those issues. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/28/2014 11:31 PM, Peter Geoghegan wrote: > On Sun, Sep 28, 2014 at 1:17 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> MERGE INTO tab USING VALUES ('foo') >> WHEN NOT MATCHED THEN >> INSERT (colB) >> WHEN MATCHED THEN >> UPDATE SET colB = NEW.p1 >> >> and throwing "ERROR: full syntax for MERGE not implemented yet" if >> people stretch too far. > > That isn't the MERGE syntax either. Where is the join? > > I've extensively discussed why I think we should avoid calling > something upsert-like MERGE, as you know: > http://www.postgresql.org/message-id/flat/CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com#CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com > > We *should* have a MERGE feature, but one that serves the actual MERGE > use-case well. That is an important use-case; it just isn't the one > I'm interested in right now. I agree we should not use the MERGE keyword for this. The upsert feature has tighter concurrency requirements than the SQL MERGE command, and that might come back to bite us. It would be highly confusing if some variants of MERGE are concurrency-safe and others are not, but if we now promise that our MERGE command is always concurrency-safe, that promise might be difficult to keep for the full MERGE syntax, and for whatever extensions the SQL committee comes up in the future. That said, it would be handy if the syntax was closer to MERGE. Aside from the concurrency issues, it does the same thing, right? So how about making the syntax identical to MERGE, except for swapping the MERGE keyword with e.g. UPSERT? - Heikki
On 2014-09-29 09:51:45 +0300, Heikki Linnakangas wrote: > That said, it would be handy if the syntax was closer to MERGE. Aside from > the concurrency issues, it does the same thing, right? So how about making > the syntax identical to MERGE, except for swapping the MERGE keyword with > e.g. UPSERT? I don't think that's a good idea. What most people are missing is an *easy* way to do upsert, that's similar to the normal INSERT. Not something with a pretty different syntax. That's why INSERT OR REPLACE and stuff like that was well adopted. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 28 September 2014 21:31, Peter Geoghegan <pg@heroku.com> wrote: > On Sun, Sep 28, 2014 at 1:17 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> MERGE INTO tab USING VALUES ('foo') >> WHEN NOT MATCHED THEN >> INSERT (colB) >> WHEN MATCHED THEN >> UPDATE SET colB = NEW.p1 >> >> and throwing "ERROR: full syntax for MERGE not implemented yet" if >> people stretch too far. > I've extensively discussed why I think we should avoid calling > something upsert-like MERGE, as you know: > http://www.postgresql.org/message-id/flat/CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com#CAM3SWZRP0c3g6+aJ=YYDGYAcTZg0xA8-1_FCVo5Xm7hrEL34kw@mail.gmail.com > > We *should* have a MERGE feature, but one that serves the actual MERGE > use-case well. That is an important use-case; it just isn't the one > I'm interested in right now. > > FWIW, I agree that it wouldn't be much work to do this - what you > present here really is just a different syntax for what I have here > (which isn't MERGE). I think it would be counter-productive to pursue > this, though. Also, what about limiting the unique indexes under > consideration? > > There was informal meeting of this at the dev meeting a in 2012. I agreed with the initial proposition to go for a different syntax. Now that I see the new syntax, I have changed my mind. The new syntax is much worse, I am sorry to say. MERGE standard does not offer guidance on concurrent effects, but there is no confusion as to how it works. We can impose our own concurrency rules since those are not covered by the standard. These are quite clear for single row inputs anyway, i.e. a VALUES clause. > > That isn't the MERGE syntax either. Where is the join? > There doesn't need to be one. INSERT assumes that if a column list is not mentioned then the VALUES clause is joined directly to the table, so we can do the same thing for MERGE. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 29 September 2014 02:20, Peter Geoghegan <pg@heroku.com> wrote: > On Sun, Sep 28, 2014 at 6:15 PM, Gavin Flower > <GavinFlower@archidevsys.co.nz> wrote: >>> What I have a problem with is using the MERGE syntax to match people's >>> preexisting confused ideas about what MERGE does. If we do that, it'll >>> definitely bite us when we go to make what we'd be calling MERGE do >>> what MERGE is actually supposed to do. I favor clearly explaining >>> that. >>> >> Opinionated I may be, but I wanted stay well clear of the syntax minefield >> in this area - as I still have at least a vestigial instinct for self >> preservation! :-) > > To be clear: I don't think Simon is confused about this at all, which > is why I'm surprised that he suggested it. At this point, I started to discuss MERGE again, but let me stop because there is a wider issue. These threads are littered with references that go nowhere. Links back to an email where you said the same thing two years ago are not proof that its a bad idea. You need to carefully explain things in detail in one place to allow people to make up their own minds, not just re-assert it endlessly and claim 3 friends also agree, while everyone else searches desperately for what the actual reasons are. Lists of problems with MERGE statement, with examples are what is needed to convince and keep us convinced. Then full documentation on the proposed solution, so we can see that also. Please go to some trouble to tidy things up so we have clarity that *we* can see and decide for ourselves whether or not you are correct. Thanks -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 29 September 2014 08:02, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-09-29 09:51:45 +0300, Heikki Linnakangas wrote: >> That said, it would be handy if the syntax was closer to MERGE. Aside from >> the concurrency issues, it does the same thing, right? So how about making >> the syntax identical to MERGE, except for swapping the MERGE keyword with >> e.g. UPSERT? > > I don't think that's a good idea. What most people are missing is an > *easy* way to do upsert, that's similar to the normal INSERT. Not > something with a pretty different syntax. That's why INSERT OR REPLACE > and stuff like that was well adopted. We have 3 choices... 1. SQL Standard MERGE (or a subset) 2. MySQL Compatible syntax 3. Something completely different If we go for (3), I would like to see a long and detailed explanation of what is wrong with (1) and (2) before we do (3). That needs to be clear, detailed, well researched, correct and agreed. Otherwise when we release such a feature, people will ask, why did you do that? And yet nobody will remember. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/29/2014 05:10 PM, Simon Riggs wrote: > > Please go to some trouble to tidy things up so we have clarity that > *we* can see and decide for ourselves whether or not you are correct. Are you suggesting a wiki page to document the issues, discussions around each issue, etc? A summary mail? Something else? We have https://wiki.postgresql.org/wiki/SQL_MERGE but it's outdated, pretty sparse, and not really about the current work. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 29 September 2014 10:27, Craig Ringer <craig@2ndquadrant.com> wrote: > On 09/29/2014 05:10 PM, Simon Riggs wrote: >> >> Please go to some trouble to tidy things up so we have clarity that >> *we* can see and decide for ourselves whether or not you are correct. > > Are you suggesting a wiki page to document the issues, discussions > around each issue, etc? A summary mail? Something else? Something that can be edited to keep it up to date, yes. > We have https://wiki.postgresql.org/wiki/SQL_MERGE but it's outdated, > pretty sparse, and not really about the current work. I rest my case. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 28 September 2014 08:40, Peter Geoghegan <pg@heroku.com> wrote: > On Sat, Sep 27, 2014 at 11:21 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> My request was for the following... >> >> Agree command semantics by producing these things >> * Explanatory documentation (Ch6.4 Data Manipulation - Upsert) > > Do you really think I could get an entire chapter out of this? If you were an ORM developer reading the PostgreSQL Release Notes for 9.5, which URL would you visit to see a complete description of the new feature, including how it works concurrently, locking and other aspects. How would you check whether some strange behaviour was a bug, or intentional? The new docs are scattered across many pages and there are very few examples. It was very difficult to read like that. >> * SQL Reference Documentation (INSERT) >> * Test cases for feature >> * Test cases for concurrency > > All of these were added. There are two new sets of isolation tests, > one per variant of the new clause (IGNORE/UPDATE). When you say "added", what do you mean? You posted one new doc patch, with no tests in it. >> Question arising: do you need to specify location criteria, or is this >> an additional filter? When/why would we want that? > > It is an additional way to specify a predicate/condition to UPDATE on. > There might be a kind of redundancy, if you decided to repeat the > constrained values in the predicate too, but if you're using the WHERE > clause sensibly there shouldn't be. So your UPDATE's "full predicate" > is sort of the union of the constrained values that the conflict path > was taken for, plus whatever you put in the WHERE clause, but not > quite because they're evaluated at different times (as explained > within transaction-iso.html). I think we should leave that out of the first commit. I'm not sure why that exists. If you wish to push down that route, then I recommend using the MERGE syntax because it caters for this much better than this. >> How would you do "if colA = 3 then ignore else update"? > > Technically, you can't do that exact thing. IGNORE is just for quickly > dealing with ETL-type problems (and it is reasonable to use it without > one particular unique index in mind, unlike ON CONFLICT UPDATE) - > think pgloader. But if you did this: > > INSERT INTO tab(colB) values('foo') ON CONFLICT UPDATE set colB = > CONFLICTING(colB) WHERE colA != 3 > > Then you would achieve almost the same thing. You wouldn't have > inserted or updated anything if the only rows considered had a colA of > 3, but any such rows considered would be locked, which isn't the same > as IGNOREing them. > >> No explanation of why the CONFLICTING() syntax differs from OLD./NEW. >> syntax used in triggers > > Why should it be the same? Because it would be a principled approach to do that. If we aren't going to use MERGE syntax, it would make sense to at least use the same terminology. e.g. INSERT .... WHEN MATCHED UPDATE The concept of "matched" is identical between MERGE and UPSERT and it will be confusing to have two words for the same thing. There seems to be a good reason not to use the MySQL syntax of ON DUPLICATE KEY UPDATE, which doesn't allow you to specify UPDATE operations other than a replace, so no deltas, e.g. SET a = a + x Having said that, it would be much nicer to have a mode that allows you to just say the word "UPDATE" and have it copy the data into the correct columns, like MySQL does. That is very intuitive, even if it isn't very flexible. >> The page makes no mention of the upsert problem, nor is any previous >> code mentioned. > > What's the upsert problem? I mean, apart from the fact that we don't > have it. Note that it is documented that one of the two outcomes is > guaranteed. > > I should have updated the plpgsql looping subxact example, though. That's what I meant. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Sep 26, 2014 at 5:40 PM, Peter Geoghegan <pg@heroku.com> wrote: > I will be frank. Everyone knows that the nbtree locking parts of this > are never going to be committed over your objections. It cannot > happen. And yet, I persist in proposing that we go that way. I may be > stubborn, but I am not so stubborn that I'd jeopardize all the work > I've put into this to save one aspect of it that no one really cares > about anyway (even I only care about meeting my goals for user visible > behavior [2]). I may actually come up with a better way to make what > you outline work; then again, I may not. I have no idea, to be honest. > It's pretty clear that I'm going to have a hard time getting your > basic approach to value locking accepted without rethinking it a lot, > though. Can you really say that you won't have serious misgivings > about something like the "tuple->xmin = InvalidTransactionId" > swapping, if I actually formally propose it? That's very invasive to a > lot of places. And right now, I have no idea how we could do better. > > I really only want to get to where we have a design that's acceptable. > In all sincerity, I may yet be convinced to go your way. It's possible > that I've failed to fully understand your concerns. Is it really just > about making INSERT ... ON CONFLICT IGNORE work with exclusion > constraints (UPDATE clearly makes little sense)? I'll be frank, too. Heikki doesn't need to persuade you to go his way, because everyone other than yourself who has looked at this problem has come up with a design that looks like his. That includes, but is not limited to, every committer who has looked at this. The burden of proof is on you to convince everyone else that the promise tuple approach is wrong, not on everyone else to convince you that it's right. This is a community, and it operates by consensus. Your opinion, no matter how strongly held in the face of opposition, is not a consensus. As far as finding an option that's better than clearing the xmin, the point is not that we'd commit that design. Well, we might, if somebody does a careful audit of all the relevant code paths and makes a convincing argument that it's safe. But more likely, somebody will go find some other bit space that can be used to do this. The fact that it's not immediately obvious to you (or Heikki) where to find that bit-space is not a principled argument for changing the whole design. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Sep 29, 2014 at 8:31 AM, Robert Haas <robertmhaas@gmail.com> wrote: > I'll be frank, too. Heikki doesn't need to persuade you to go his > way, because everyone other than yourself who has looked at this > problem has come up with a design that looks like his. Andres suggested something that is very roughly comparable, perhaps. And that was it, really, except for your suggestion that I convinced you wasn't the best way forward (for unrelated reasons). > As far as finding an option that's better than clearing the xmin, the > point is not that we'd commit that design. Well, we might, if > somebody does a careful audit of all the relevant code paths and makes > a convincing argument that it's safe. But more likely, somebody will > go find some other bit space that can be used to do this. The fact > that it's not immediately obvious to you (or Heikki) where to find > that bit-space is not a principled argument for changing the whole > design. I never said that it was. *Obviously* I know that Heikki is not obligated to convince me of anything - I said as much. Whether or not Heikki is obligated to convince me is not the point, which is that it would be nice if he could convince me. I think that there are some serious issues with the promise tuples approach, and discussing those brings us closer to moving forward. -- Peter Geoghegan
On Mon, Sep 29, 2014 at 10:37 AM, Peter Geoghegan <pg@heroku.com> wrote: > But more likely, somebody will >> go find some other bit space that can be used to do this. My concerns have nothing to do with the availability of bit space, obviously. -- Peter Geoghegan
On Mon, Sep 29, 2014 at 7:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > If you were an ORM developer reading the PostgreSQL Release Notes for > 9.5, which URL would you visit to see a complete description of the > new feature, including how it works concurrently, locking and other > aspects. How would you check whether some strange behaviour was a bug, > or intentional? We don't do that with UPDATE, so why would we do it with this? There is an existing structure to the documentation that needs to be respected. This is the case even though the EvalPlanQual() mechanism is a total Postgres-ism, which can potentially violate snapshot isolation (this is not true of Oracle's READ COMMITTED, for example). You have to go out of your way to find that out at the moment. But I know ORM authors, and the majority probably don't understand this stuff - that ought to be okay. >> All of these were added. There are two new sets of isolation tests, >> one per variant of the new clause (IGNORE/UPDATE). > > When you say "added", what do you mean? You posted one new doc patch, > with no tests in it. I mean that there was a commit (not included with the documentation, but with the original patchset) with many tests. I don't know why you're suggesting that I don't have "concurrency tests". There are isolation tests in that commit. There are also many regression tests. >> It is an additional way to specify a predicate/condition to UPDATE on. >> There might be a kind of redundancy, if you decided to repeat the >> constrained values in the predicate too, but if you're using the WHERE >> clause sensibly there shouldn't be. So your UPDATE's "full predicate" >> is sort of the union of the constrained values that the conflict path >> was taken for, plus whatever you put in the WHERE clause, but not >> quite because they're evaluated at different times (as explained >> within transaction-iso.html). > > I think we should leave that out of the first commit. I'm not sure why > that exists. If you wish to push down that route, then I recommend > using the MERGE syntax because it caters for this much better than > this. Why leave it out? People are going to "push the predicate into the targetlist" if I do, and the effect is exactly the same. >>> No explanation of why the CONFLICTING() syntax differs from OLD./NEW. >>> syntax used in triggers >> >> Why should it be the same? > > Because it would be a principled approach to do that. That is just an assertion. The MERGE syntax doesn't use that either. > If we aren't going to use MERGE syntax, it would make sense to at > least use the same terminology. > > e.g. > INSERT .... > WHEN MATCHED > UPDATE > > The concept of "matched" is identical between MERGE and UPSERT and it > will be confusing to have two words for the same thing. I don't care if we change the spelling to "WHEN MATCHED UPDATE/IGNORE". That seems fine. But MERGE is talking about a join, not the presence of a would-be duplicate violation. > There seems to be a good reason not to use the MySQL syntax of ON > DUPLICATE KEY UPDATE, which doesn't allow you to specify UPDATE > operations other than a replace, so no deltas, e.g. SET a = a + x That isn't true, actually. It clearly does. > Having said that, it would be much nicer to have a mode that allows > you to just say the word "UPDATE" and have it copy the data into the > correct columns, like MySQL does. That is very intuitive, even if it > isn't very flexible. Multi-assignment updates (with or without CONFLICTING()) are supported, FWIW. -- Peter Geoghegan
On Mon, Sep 29, 2014 at 12:02 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-09-29 09:51:45 +0300, Heikki Linnakangas wrote: >> That said, it would be handy if the syntax was closer to MERGE. Aside from >> the concurrency issues, it does the same thing, right? So how about making >> the syntax identical to MERGE, except for swapping the MERGE keyword with >> e.g. UPSERT? > > I don't think that's a good idea. What most people are missing is an > *easy* way to do upsert, that's similar to the normal INSERT. Not > something with a pretty different syntax. That's why INSERT OR REPLACE > and stuff like that was well adopted. Agreed. MERGE isn't the same other than the concurrency concerns, in any case. It is driven by a join, which is very flexible, but also has problems with concurrency (leaving aside the fact that in practice it doesn't tend to work out well when it isn't an equi-join). UPSERT *has* to be driven by something like a would-be unique violation, not an outer join matching or not matching. -- Peter Geoghegan
On Mon, Sep 29, 2014 at 2:27 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >> Please go to some trouble to tidy things up so we have clarity that >> *we* can see and decide for ourselves whether or not you are correct. > > Are you suggesting a wiki page to document the issues, discussions > around each issue, etc? A summary mail? Something else? It isn't easy, Simon. I thought my big e-mail at the start of the thread was a summary. -- Peter Geoghegan
On Mon, Sep 29, 2014 at 2:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > 1. SQL Standard MERGE (or a subset) > 2. MySQL Compatible syntax > 3. Something completely different > > If we go for (3), I would like to see a long and detailed explanation > of what is wrong with (1) and (2) before we do (3). That needs to be > clear, detailed, well researched, correct and agreed. Otherwise when > we release such a feature, people will ask, why did you do that? And > yet nobody will remember. My syntax is inspired by the MySQL one, with some influence from SQLite (SQLite have an ON CONFLICT REPLACE). I don't want to copy MySQL's use of VALUES() in the UPDATE targetlist - I spell the same concept as CONFLICTING(). I guess that otherwise they'd have to make the VALUES()/CONFLICTING() expression a whole new fully reserved keyword, and preferred not to. Also, MySQL bizarrely omits the "SET" keyword within ON DUPLICATE KEY UPDATE. So I haven't copied it exactly on aesthetic grounds. I think that the actual reason for the latter wart (the SET omission) is that MySQL found it easier to write the grammar that way. Consider what we do here to make SET in an UPDATE work, despite the fact that it's a valid column name: https://github.com/postgres/postgres/blob/REL9_4_STABLE/src/backend/parser/gram.y#L10141 So I wanted to suggest something similar but not identical to the MySQL syntax, with a bit more flexibility/safety. I thought that I could do so without emulating their warts. As I've mentioned, it isn't the MERGE syntax because that is quite a different thing. There is a place for it, but it's not strategically important in the same way as upsert is. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> wrote: > As I've mentioned, it isn't the MERGE syntax because that is > quite a different thing. There is a place for it, but it's not > strategically important in the same way as upsert is. I think that the subset of the MERGE syntax that would be needed for UPSERT behavior would be as follows. For one row as literals: MERGE INTO tab t USING (VALUES ('foo', 'p1')) new(id, colB) ON (t.id = new.id) WHEN MATCHED THEN UPDATE SET colB= new.colB WHEN NOT MATCHED THEN INSERT (id, colB) VALUES (new.id, new.colB); If you have a bunch of rows in a "bar" table you want to merge in: MERGE INTO tab t USING (SELECT id, colB FROM bar) b ON (t.id = b.id) WHEN MATCHED THEN UPDATE SET colB = b.colB WHEN NOT MATCHED THEN INSERT (id, colB) VALUES (b.id, b.colB); I fail to see how this is harder or more problematic than the nonstandard suggestions that have been floated. I don't know why we would be even *considering* a nonstandard syntax rather than saying that only this subset is supported *so far*. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Sep 29, 2014 at 1:40 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > I think that the subset of the MERGE syntax that would be needed > for UPSERT behavior would be as follows. For one row as literals: > > MERGE INTO tab t > USING (VALUES ('foo', 'p1')) new(id, colB) > ON (t.id = new.id) > WHEN MATCHED THEN > UPDATE SET colB = new.colB > WHEN NOT MATCHED THEN > INSERT (id, colB) VALUES (new.id, new.colB); > > If you have a bunch of rows in a "bar" table you want to merge in: > > MERGE INTO tab t > USING (SELECT id, colB FROM bar) b > ON (t.id = b.id) > WHEN MATCHED THEN > UPDATE SET colB = b.colB > WHEN NOT MATCHED THEN > INSERT (id, colB) VALUES (b.id, b.colB); > > I fail to see how this is harder or more problematic than the > nonstandard suggestions that have been floated. I don't know why > we would be even *considering* a nonstandard syntax rather than > saying that only this subset is supported *so far*. Heikki, Andres and I are against using MERGE for this, fwiw. Tom seemed to think so too, on previous occasions. It isn't a matter of alternative syntaxes. I have described in detail why I think it's a bad idea - I have linked to that about 3 times in this thread. It paints us into a corner when we go to make this do what MERGE is supposed to do. Do you want a feature that, when fully generalized, plays a special visibility game based on whether or not some exact set of conditions are met? That is a non-starter, IMV. The whole idea of using an arbitrary join syntax seems great, but I need something that works backwards from would-be unique violations. That's the only way to preserve the UPSERT guarantees (atomicity, definite insert or update). -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> wrote: > Heikki, Andres and I are against using MERGE for this, fwiw. Tom > seemed to think so too, on previous occasions. It isn't a matter > of alternative syntaxes. I have described in detail why I think > it's a bad idea - I have linked to that about 3 times in this > thread. Yeah, I read that, and I'm not convinced. > It paints us into a corner when we go to make this do what MERGE > is supposed to do. Do you want a feature that, when fully > generalized, plays a special visibility game based on whether or > not some exact set of conditions are met? That is a non-starter, > IMV. For other queries we use different access techniques not only based on the presence of an index, but on the state of the visibility map, degree of bloat, ordering of tuples in a heap, etc. -- so sure, I'm OK with different execution styles based on whether your join conditions match a unique index on columns that can't be NULL. > The whole idea of using an arbitrary join syntax seems great, > but I need something that works backwards from would-be unique > violations. That's the only way to preserve the UPSERT guarantees > (atomicity, definite insert or update). I absolutely don't buy that it is the *only way*. It is probably (by far) the *easiest* way, and doing so gets us a frequently- requested feature; but I think limiting the initial implementation to cases where the join conditions include equality tests on all columns of some appropriate unique index is fine, and doesn't seem to me to preclude further development of the MERGE feature for additional cases. In fact, I think having something to build on is a plus. The claims that you can't get a duplicate key error with an UPSERT are completely bogus, IMV. The *best* you can do is avoid them on the index used for matching (unless you're willing to ignore problem input rows or mangle the data in surprising ways to avoid such an error on a second unique index). With a fully functional MERGE syntax you could eventually gain the ability to write exceptions like that to the location of your choice (be it a table or WARNING messages in the log). -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Sep 29, 2014 at 2:20 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > The claims that you can't get a duplicate key error with an UPSERT > are completely bogus, IMV. The *best* you can do is avoid them on > the index used for matching (unless you're willing to ignore > problem input rows or mangle the data in surprising ways to avoid > such an error on a second unique index). That's what I meant. Doing any more than that isn't useful. I want to do exactly that - no more, no less. If you're still not convinced, then I think the fact that no MERGE implementation does what you want should be convincing. It is *horrifically* complicated to make what you want work, if indeed it is technically feasible at all. Isn't this already complicated enough? We use different access techniques as you say. We do not use different types of snapshots. That seems like a pretty fundamental distinction. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> wrote: > I think the fact that no MERGE implementation does what you want > should be convincing. It is *horrifically* complicated to make > what you want work, if indeed it is technically feasible at all. > Isn't this already complicated enough? What about the MERGE syntax I posted makes it hard to implement the statement validation and execution code you already have? (I'm asking about for the UPSERT case only, not an implementation of all aspects of the standard syntax.) To recap, in summary that would be: MERGE INTO tablename [ alias ] USING ( relation ) [ alias ] ON ( boolean-expression ) WHEN MATCHED THEN UPDATESET target-column = expression [ , target-column = expression ] ... WHEN NOT MATCHED THEN INSERT (target-columns ) VALUES ( expressions ) The initial implementation could restrict to these exact clauses and require that the boolean-expression used equality-quals on all columns of a unique index on only NOT NULL columns. I think the relation could be a VALUES clause or any SELECT statement without causing problems; do you think that would need to be constrained in some way? It would be wonderful if the expressions could be any arbitrary expressions assignable to the target columns; do you see a need to constrain that? If we later expand the MERGE statement to more general cases, I don't see why statements of this form could not be treated as a special case. Personally, I'm dubious that we would want to compromise transactional integrity to achieve the broader case, but doubt that we would need to do so. I won't say it is just a SMOP, because there would need to be some careful design first. ;-) > We use different access techniques as you say. We do not use > different types of snapshots. That seems like a pretty > fundamental distinction. We use special types of snapshots in running DML that fires certain types of constraints, like FKs. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-09-29 14:57:45 -0700, Kevin Grittner wrote: > Peter Geoghegan <pg@heroku.com> wrote: > > > I think the fact that no MERGE implementation does what you want > > should be convincing. It is *horrifically* complicated to make > > what you want work, if indeed it is technically feasible at all. > > Isn't this already complicated enough? > > What about the MERGE syntax I posted makes it hard to implement the > statement validation and execution code you already have? (I'm > asking about for the UPSERT case only, not an implementation of all > aspects of the standard syntax.) > To recap, in summary that would be: > > MERGE INTO tablename [ alias ] > USING ( relation ) [ alias ] > ON ( boolean-expression ) > WHEN MATCHED THEN > UPDATE SET target-column = expression > [ , target-column = expression ] ... > WHEN NOT MATCHED THEN > INSERT ( target-columns ) VALUES ( expressions ) > > The initial implementation could restrict to these exact clauses > and require that the boolean-expression used equality-quals on all > columns of a unique index on only NOT NULL columns. That'll make it really hard to actually implement real MERGE. Because suddenly there's no way for the user to know whether he's written a ON condition that can implement UPSERT like properties (i.e. the *precise* column list of an index) or not. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Sep 29, 2014 at 3:02 PM, Andres Freund <andres@2ndquadrant.com> wrote: > That'll make it really hard to actually implement real MERGE. > > Because suddenly there's no way for the user to know whether he's > written a ON condition that can implement UPSERT like properties > (i.e. the *precise* column list of an index) or not. Exactly. The difficulty isn't doing what Kevin says so much as doing so and then at a later date taking that thing and making it into a fully featured MERGE. We'll be painted into a corner. That's bad, because as I've said I think we need MERGE too (just far less urgently). -- Peter Geoghegan
Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-09-29 14:57:45 -0700, Kevin Grittner wrote: >> The initial implementation could restrict to these exact clauses >> and require that the boolean-expression used equality-quals on all >> columns of a unique index on only NOT NULL columns. > > That'll make it really hard to actually implement real MERGE. > > Because suddenly there's no way for the user to know whether he's > written a ON condition that can implement UPSERT like properties > (i.e. the *precise* column list of an index) or not. Well, unless we abandon transactional semantics for other MERGE statements, we should have a way that UPSERT logic continues to work if you don't match a suitable index; it will just be slower -- potentially a lot slower, but that's what indexes are for. I don't think we need a separate statement type for the one we "do well", because I don't think we should do the other one without proper transactional semantics. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-09-29 15:08:36 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > On 2014-09-29 14:57:45 -0700, Kevin Grittner wrote: > > >> The initial implementation could restrict to these exact clauses > >> and require that the boolean-expression used equality-quals on all > >> columns of a unique index on only NOT NULL columns. > > > > That'll make it really hard to actually implement real MERGE. > > > > Because suddenly there's no way for the user to know whether he's > > written a ON condition that can implement UPSERT like properties > > (i.e. the *precise* column list of an index) or not. > > Well, unless we abandon transactional semantics for other MERGE > statements, we should have a way that UPSERT logic continues to > work if you don't match a suitable index; it will just be slower -- > potentially a lot slower, but that's what indexes are for. I don't > think we need a separate statement type for the one we "do well", > because I don't think we should do the other one without proper > transactional semantics. Wrong. You can't realistically implement the guarantees of UPSERT without a corresponding UNIQUE index. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > Wrong. You can't realistically implement the guarantees of UPSERT > without a corresponding UNIQUE index. You definitely can do it; the question is what you consider reasonable in terms of development effort, performance, and concurrency. I think the problem can be solved with non-scary values of pretty much any two of those. I guess my assumption is that we won't handle the general case until someone wants to put the substantial development effort into making the other two acceptable. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-09-29 15:16:49 -0700, Kevin Grittner wrote: > Andres Freund <andres@2ndquadrant.com> wrote: > > > Wrong. You can't realistically implement the guarantees of UPSERT > > without a corresponding UNIQUE index. > > You definitely can do it; the question is what you consider > reasonable in terms of development effort, performance, and > concurrency. Right. You can exclusively lock the table and such. The point is just that nobody wants that. I.e. people want to be warned about it. > I think the problem can be solved with non-scary values of pretty much > any two of those. I guess my assumption is that we won't handle the > general case until someone wants to put the substantial development > effort into making the other two acceptable. Which would be a major loss because MERGE is rather useful outside of atomic upsert. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Sep 29, 2014 at 3:08 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Well, unless we abandon transactional semantics for other MERGE > statements, we should have a way that UPSERT logic continues to > work if you don't match a suitable index; it will just be slower -- > potentially a lot slower, but that's what indexes are for. I want an implementation that doesn't have unique violations, unprincipled deadlocks, or serialization failures at READ COMMITTED. I want it because that's what the majority of users actually want. It requires no theoretical justification. > I don't > think we need a separate statement type for the one we "do well", > because I don't think we should do the other one without proper > transactional semantics. That seems like a very impractical attitude. I cannot simulate what I've been doing with unique indexes without taking an exclusive table lock. That is a major footgun, so it isn't going to happen. -- Peter Geoghegan
On 09/30/2014 01:59 AM, Peter Geoghegan wrote: > On Mon, Sep 29, 2014 at 7:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> If you were an ORM developer reading the PostgreSQL Release Notes for >> 9.5, which URL would you visit to see a complete description of the >> new feature, including how it works concurrently, locking and other >> aspects. How would you check whether some strange behaviour was a bug, >> or intentional? > > We don't do that with UPDATE, so why would we do it with this? There > is an existing structure to the documentation that needs to be > respected. I tend to agree, so long as there are appropriate cross-references. See, for example, how window function information was added. >This is the case even though the EvalPlanQual() mechanism > is a total Postgres-ism, which can potentially violate snapshot > isolation (this is not true of Oracle's READ COMMITTED, for example). That's useful to know, and certainly worth covering in the isolation portion of the docs. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/30/2014 05:57 AM, Kevin Grittner wrote: > Peter Geoghegan <pg@heroku.com> wrote: > >> I think the fact that no MERGE implementation does what you want >> should be convincing. It is *horrifically* complicated to make >> what you want work, if indeed it is technically feasible at all. >> Isn't this already complicated enough? > > What about the MERGE syntax I posted makes it hard to implement the > statement validation and execution code you already have? (I'm > asking about for the UPSERT case only, not an implementation of all > aspects of the standard syntax.) As I understand it, it isn't the syntax that's hard, it's the logic behind it. FWIW I'm pretty persuaded by the argument that: * Other RDBMSes's MERGE implementations don't behave this way; * MERGE is a join-based operation, it's not really the same as an upsert (though a join on a values-list is similar-ish); * Making MERGE work for the concurrency-safe upsert case would render it harder to then support the rest of MERGE for theOLAP/data merging cases it's really specified for. I also have a serious usability concern about re-purposing MERGE for this. I think it'll be confusing to have a MERGE that's usable as a concurrency-safe upsert and also as a non-concurrency-safe data merging operation with slightly different options. Borrowing from / closely following the MERGE syntax likely makes sense, but special-casing a subset of MERGE would IMO be a regrettable long-term decision. > If we later expand the MERGE statement to more general cases, I > don't see why statements of this form could not be treated as a > special case. Please, no. That's basically having two different kinds of statement with subtly different syntax differentiating them. Upsert is full of confusing and subtle behaviour. Any implementation needs to focus on making it easy to get right, and I don't think having something where small syntax variations can cause you to silently trip out of the concurrency-safe mode of operation would meet that need. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/30/2014 06:08 AM, Kevin Grittner wrote: > Well, unless we abandon transactional semantics for other MERGE > statements, we should have a way that UPSERT logic continues to > work if you don't match a suitable index; it will just be slower -- > potentially a lot slower, but that's what indexes are for. That would probably lead to MERGE taking different lock strengths based on index availability, having different failure modes, etc. The less internal magic inside what's already a complicated and confusing area for users, the better. > I don't > think we need a separate statement type for the one we "do well", > because I don't think we should do the other one without proper > transactional semantics. "Proper transactional semantics" isn't the same as "free from all forms of race condition". Sometimes you want or need to do things that can't be made concurrency-safe, or would perform unacceptably if done in a concurrency-safe manner. That's why we have LOCK TABLE, among other things. We have READ COMMITTED for a reason. We have SELECT without FOR SHARE for a reason. MERGE seems to be specified as more of an OLAP / ETL operation than an OLTP one, and I think we should probably respect that - and the way other RDBMSes have already implemented it. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Sep 29, 2014 at 3:08 PM, Kevin Grittner <kgrittn@ymail.com> wrote: >> Well, unless we abandon transactional semantics for other MERGE >> statements, we should have a way that UPSERT logic continues to >> work if you don't match a suitable index; it will just be slower -- >> potentially a lot slower, but that's what indexes are for. > > I want an implementation that doesn't have unique violations, > unprincipled deadlocks, or serialization failures at READ COMMITTED. I > want it because that's what the majority of users actually want. It > requires no theoretical justification. Sure. I'm not suggesting otherwise. >> I don't think we need a separate statement type for the one we >> "do well", because I don't think we should do the other one >> without proper transactional semantics. > > That seems like a very impractical attitude. I cannot simulate what > I've been doing with unique indexes without taking an exclusive table > lock. That is a major footgun, so it isn't going to happen. There are certainly other ways to do it, although they require more work. As far as UPSERT goes, I agree that we should require such an index, at least for the initial implementation and into the foreseeable future. What I'm saying is that if we implement it using the standard MERGE syntax, then if the features of MERGE are extended it will continue to work even in the absence of such an index. The index becomes a way of optimizing access rather than defining what access is allowed. At the risk of pushing people away from this POV, I'll point out that this is somewhat similar to what we do for unlogged bulk loads -- if all the conditions for doing it the fast way are present, we do it the fast way; otherwise it still works, but slower. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 29 September 2014 18:59, Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Sep 29, 2014 at 7:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> If you were an ORM developer reading the PostgreSQL Release Notes for >> 9.5, which URL would you visit to see a complete description of the >> new feature, including how it works concurrently, locking and other >> aspects. How would you check whether some strange behaviour was a bug, >> or intentional? > > We don't do that with UPDATE, so why would we do it with this? Because this is new, harder and non-standard, so there is no other place to look. If you want to persuade us that MERGE has poorly defined concurrency, so you have implemented a new command, the new command had better have very well defined behaviour. And because a reviewer asked for it? For example, this patch for UPSERT doesn't support updatable views. But I can't see anyone that didn't read the patch would know that. >>> All of these were added. There are two new sets of isolation tests, >>> one per variant of the new clause (IGNORE/UPDATE). >> >> When you say "added", what do you mean? You posted one new doc patch, >> with no tests in it. > > I mean that there was a commit (not included with the documentation, > but with the original patchset) with many tests. I don't know why > you're suggesting that I don't have "concurrency tests". There are > isolation tests in that commit. There are also many regression tests. I see the tests in earlier patches; I was observing there are no new ones. There are no tests for the use of CONFLICTING() syntax No tests for interaction with triggers, with regard to before triggers changing values prior to conflict detection. My hope was that the complex behaviour of multiple unique indexes might be explained there. Forgive me, I didn't see it. >>>> No explanation of why the CONFLICTING() syntax differs from OLD./NEW. >>>> syntax used in triggers >>> >>> Why should it be the same? >> >> Because it would be a principled approach to do that. > > That is just an assertion. The MERGE syntax doesn't use that either. MERGE allows "AS row" which then allow you to refer to row.x for column x of the input. Other people have independently commented the same thing. >> If we aren't going to use MERGE syntax, it would make sense to at >> least use the same terminology. >> >> e.g. >> INSERT .... >> WHEN MATCHED >> UPDATE >> >> The concept of "matched" is identical between MERGE and UPSERT and it >> will be confusing to have two words for the same thing. > > I don't care if we change the spelling to "WHEN MATCHED > UPDATE/IGNORE". That seems fine. But MERGE is talking about a join, > not the presence of a would-be duplicate violation. I don't understand that comment. >> There seems to be a good reason not to use the MySQL syntax of ON >> DUPLICATE KEY UPDATE, which doesn't allow you to specify UPDATE >> operations other than a replace, so no deltas, e.g. SET a = a + x > > That isn't true, actually. It clearly does. It does. Rather amusingly I misread the very unclear MySQL docs. >> Having said that, it would be much nicer to have a mode that allows >> you to just say the word "UPDATE" and have it copy the data into the >> correct columns, like MySQL does. That is very intuitive, even if it >> isn't very flexible. > > Multi-assignment updates (with or without CONFLICTING()) are supported, FWIW. If I want the incoming row to overwrite the old row, it would be good to have syntax to support that easily. Why doesn't INSERT INTO UNIQUE_TBL VALUES (1, 'a'), (2, 'b'), (2, 'b') ON CONFLICT UPDATE SET t = 'fails'; end up with this in the table? 1 a 2 fails What happens with this? BEGIN; INSERT INTO UNIQUE_TBL VALUES (2, 'b') ON CONFLICT UPDATE SET t = 'fails'; INSERT INTO UNIQUE_TBL VALUES (2, 'b') ON CONFLICT UPDATE SET t = 'fails'; COMMIT; -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 30, 2014 at 8:30 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>>> No explanation of why the CONFLICTING() syntax differs from OLD./NEW. >>>>> syntax used in triggers >>>> >>>> Why should it be the same? >>> >>> Because it would be a principled approach to do that. >> >> That is just an assertion. The MERGE syntax doesn't use that either. > > MERGE allows "AS row" which then allow you to refer to row.x for > column x of the input. It does, but that isn't what you suggested. You talked about the OLD.*/NEW.* syntax. >> I don't care if we change the spelling to "WHEN MATCHED >> UPDATE/IGNORE". That seems fine. But MERGE is talking about a join, >> not the presence of a would-be duplicate violation. > > I don't understand that comment. I just mean that if you want to replace ON CONFLICT UPDATE with WHEN MATCHED UPDATE - that little part of the grammar - that seems okay. >> Multi-assignment updates (with or without CONFLICTING()) are supported, FWIW. > > If I want the incoming row to overwrite the old row, it would be good > to have syntax to support that easily. Well, maybe I'll get around to that when things settle down. That's clearly in the realm of "nice to have", though. > Why doesn't > INSERT INTO UNIQUE_TBL VALUES (1, 'a'), (2, 'b'), (2, 'b') ON > CONFLICT UPDATE SET t = 'fails'; > end up with this in the table? > > 1 a > 2 fails A "cardinality violation" - just like MERGE. As with MERGE, the final value of every row needs to be deterministic (within the command). > What happens with this? > > BEGIN; > INSERT INTO UNIQUE_TBL VALUES (2, 'b') ON CONFLICT UPDATE SET t = 'fails'; > INSERT INTO UNIQUE_TBL VALUES (2, 'b') ON CONFLICT UPDATE SET t = 'fails'; > COMMIT; It works fine. No cardinality violation with two separate commands. See the new ExecLockUpdateTuple() function within nodeModifyTable.c for extensive discussion on how this is handled. -- Peter Geoghegan
On Tue, Sep 30, 2014 at 8:30 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 29 September 2014 18:59, Peter Geoghegan <pg@heroku.com> wrote: >> On Mon, Sep 29, 2014 at 7:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> If you were an ORM developer reading the PostgreSQL Release Notes for >>> 9.5, which URL would you visit to see a complete description of the >>> new feature, including how it works concurrently, locking and other >>> aspects. How would you check whether some strange behaviour was a bug, >>> or intentional? >> >> We don't do that with UPDATE, so why would we do it with this? > > Because this is new, harder and non-standard, so there is no other > place to look. If you want to persuade us that MERGE has poorly > defined concurrency, so you have implemented a new command, the new > command had better have very well defined behaviour. I'm making a point about the structure of the docs here. The behavior *is* documented, just not in the INSERT documentation, a situation I've compare with how EvalPlanQual() isn't discussed in the UPDATE/DELETE/SELECT FOR UPDATE docs. And EvalPlanQual() has some pretty surprising corner-case behaviors. That having been said, maybe I could have gone into more detail on the "consensus among unique indexes" thing in another part of the documentation, since that isn't separately covered (only the aspects of when the predicate is evaluated in READ COMMITTED mode and other things like that were covered). > For example, this patch for UPSERT doesn't support updatable views. > But I can't see anyone that didn't read the patch would know that. By reading the CREATE VIEW docs. Maybe there could stand to be a compatibility note in the main INSERT command, but I didn't want to do that as long as things were up in the air. It might be the case that we figure out good behavior for updatable views. -- Peter Geoghegan
On 09/30/2014 11:20 AM, Peter Geoghegan wrote: >> > For example, this patch for UPSERT doesn't support updatable views. >> > But I can't see anyone that didn't read the patch would know that. > By reading the CREATE VIEW docs. Maybe there could stand to be a > compatibility note in the main INSERT command, but I didn't want to do > that as long as things were up in the air. It might be the case that > we figure out good behavior for updatable views. All of these things sound like good ideas for documentation improvements, but hardly anything which should block the patch. It has documentation, more than we'd require for a lot of other patches, and it's not like the 9.5 release is next month. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-09-30 11:49:21 -0700, Josh Berkus wrote: > On 09/30/2014 11:20 AM, Peter Geoghegan wrote: > >> > For example, this patch for UPSERT doesn't support updatable views. > >> > But I can't see anyone that didn't read the patch would know that. > > By reading the CREATE VIEW docs. Maybe there could stand to be a > > compatibility note in the main INSERT command, but I didn't want to do > > that as long as things were up in the air. It might be the case that > > we figure out good behavior for updatable views. > > All of these things sound like good ideas for documentation > improvements, but hardly anything which should block the patch. It has > documentation, more than we'd require for a lot of other patches, and > it's not like the 9.5 release is next month. What's blocking it is that (afaik) no committer agrees with the approach taken to solve the concurrency problems. And several (Heikki, Robert, me) have stated their dislike of the proposed approach. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/30/2014 11:51 AM, Andres Freund wrote: >> All of these things sound like good ideas for documentation >> > improvements, but hardly anything which should block the patch. It has >> > documentation, more than we'd require for a lot of other patches, and >> > it's not like the 9.5 release is next month. > What's blocking it is that (afaik) no committer agrees with the approach > taken to solve the concurrency problems. And several (Heikki, Robert, > me) have stated their dislike of the proposed approach. If that's what's blocking it then fine. But if we might change the concurrency approach, then what's the point in quibbling about docs? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 09/30/2014 07:15 AM, Kevin Grittner wrote: > There are certainly other ways to do it, although they require more > work. As far as UPSERT goes, I agree that we should require such > an index, at least for the initial implementation and into the > foreseeable future. What I'm saying is that if we implement it > using the standard MERGE syntax, then if the features of MERGE are > extended it will continue to work even in the absence of such an > index. The index becomes a way of optimizing access rather than > defining what access is allowed. > > At the risk of pushing people away from this POV, I'll point out > that this is somewhat similar to what we do for unlogged bulk loads > -- if all the conditions for doing it the fast way are present, we > do it the fast way; otherwise it still works, but slower. Except that switching between fast/slow bulk loads affects *only* the speed of loading, not the locking rules. Having a statement silently take a full table lock when we were expecting it to be concurrent (because, for example, the index got rebuilt and someone forgot the UNIQUE) violates POLA from my perspective. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Sep 30, 2014 at 11:51 AM, Andres Freund <andres@2ndquadrant.com> wrote: > What's blocking it is that (afaik) no committer agrees with the approach > taken to solve the concurrency problems. And several (Heikki, Robert, > me) have stated their dislike of the proposed approach. Well, it depends on what you mean by "approach to concurrency problems". It's not as if a consensus has emerged in favor of another approach, and if there is to be another approach, the details need to be worked out ASAP. Even still, I would appreciate it if people could review the patch on the assumption that those issues will be worked out. After all, there are plenty of other parts to this that have nothing to do with value locking - the entire "top half", which has significant subtleties (some involving concurrency) in its own right, reasonably well encapsulated from value locking. A couple of weeks ago, I felt good about the fact that it seemed "time was on my side" 9.5-wise, but maybe that isn't true. Working through the community process for this patch is going to be very difficult. I think everyone understands that there could be several ways of implementing value locking. I really do think it's a well encapsulated aspect of the patch, though, so even if you hate how I've implemented value locking, please try and give feedback on everything else. Simon wanted to start with the user-visible semantics, which makes sense, but I see no reason to limit it to that. -- Peter Geoghegan
On 2014-09-30 12:05:46 -0700, Peter Geoghegan wrote: > On Tue, Sep 30, 2014 at 11:51 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > What's blocking it is that (afaik) no committer agrees with the approach > > taken to solve the concurrency problems. And several (Heikki, Robert, > > me) have stated their dislike of the proposed approach. > > Well, it depends on what you mean by "approach to concurrency > problems". It's not as if a consensus has emerged in favor of another > approach, and if there is to be another approach, the details need to > be worked out ASAP. Well. People have given you outlines of approaches. And Heikki even gave you a somewhat working prototype. I don't think you can fairly expect more. > Even still, I would appreciate it if people could > review the patch on the assumption that those issues will be worked > out. Right now I don't really see the point. You've so far shown no inclination to accept significant concerns about your approach. And without an agreement about how to solve the concurrency issues the feature is dead in the water. And thus time spent reviewing isn't well spent. I'm pretty sure I'm not the only one feeling that way at this point. > A couple of weeks > ago, I felt good about the fact that it seemed "time was on my side" > 9.5-wise, but maybe that isn't true. Working through the community > process for this patch is going to be very difficult. The community process involves accepting that your opinion isn't the community's. Believe me, I learned that the hard way. It's one thing to argue about the implementation of a feature for a week or four. Or even insist that you're right in some implementation detail local to your new code. But you've not moved one jota in the critical parts that affect large parts of the system in half a year. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Josh Berkus <josh@agliodbs.com> wrote: > On 09/30/2014 07:15 AM, Kevin Grittner wrote: >> At the risk of pushing people away from this POV, I'll point out >> that this is somewhat similar to what we do for unlogged bulk loads >> -- if all the conditions for doing it the fast way are present, we >> do it the fast way; otherwise it still works, but slower. > > Except that switching between fast/slow bulk loads affects *only* the > speed of loading, not the locking rules. Having a statement silently > take a full table lock when we were expecting it to be concurrent > (because, for example, the index got rebuilt and someone forgot the > UNIQUE) violates POLA from my perspective. I would not think that an approach which took a full table lock to implement the more general case would be accepted. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 09/30/2014 02:39 PM, Kevin Grittner wrote: > Josh Berkus <josh@agliodbs.com> wrote: >> On 09/30/2014 07:15 AM, Kevin Grittner wrote: > >>> At the risk of pushing people away from this POV, I'll point out >>> that this is somewhat similar to what we do for unlogged bulk loads >>> -- if all the conditions for doing it the fast way are present, we >>> do it the fast way; otherwise it still works, but slower. >> >> Except that switching between fast/slow bulk loads affects *only* the >> speed of loading, not the locking rules. Having a statement silently >> take a full table lock when we were expecting it to be concurrent >> (because, for example, the index got rebuilt and someone forgot the >> UNIQUE) violates POLA from my perspective. > > I would not think that an approach which took a full table lock to > implement the more general case would be accepted. Why not? There are certainly cases ... like bulk loading ... where users would find it completely acceptable. Imagine that you're merging 3 files into a single unlogged table before processing them into finished data. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > On 09/30/2014 02:39 PM, Kevin Grittner wrote: >> Josh Berkus <josh@agliodbs.com> wrote: >>> On 09/30/2014 07:15 AM, Kevin Grittner wrote: >>> >>>> At the risk of pushing people away from this POV, I'll point out >>>> that this is somewhat similar to what we do for unlogged bulk loads >>>> -- if all the conditions for doing it the fast way are present, we >>>> do it the fast way; otherwise it still works, but slower. >>> >>> Except that switching between fast/slow bulk loads affects *only* the >>> speed of loading, not the locking rules. Having a statement silently >>> take a full table lock when we were expecting it to be concurrent >>> (because, for example, the index got rebuilt and someone forgot the >>> UNIQUE) violates POLA from my perspective. >> >> I would not think that an approach which took a full table lock to >> implement the more general case would be accepted. > > Why not? There are certainly cases ... like bulk loading ... where > users would find it completely acceptable. Imagine that you're merging > 3 files into a single unlogged table before processing them into > finished data. So the expectation is that when we implement MERGE it will, by default, take out an EXCLUSIVE lock for the entire target table for the entire duration of the command? I would have expected a bit more finesse. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 09/30/2014 02:51 PM, Kevin Grittner wrote: > Josh Berkus <josh@agliodbs.com> wrote: >> On 09/30/2014 02:39 PM, Kevin Grittner wrote: >>> Josh Berkus <josh@agliodbs.com> wrote: >>>> On 09/30/2014 07:15 AM, Kevin Grittner wrote: >>>> >>>>> At the risk of pushing people away from this POV, I'll point out >>>>> that this is somewhat similar to what we do for unlogged bulk loads >>>>> -- if all the conditions for doing it the fast way are present, we >>>>> do it the fast way; otherwise it still works, but slower. >>>> >>>> Except that switching between fast/slow bulk loads affects *only* the >>>> speed of loading, not the locking rules. Having a statement silently >>>> take a full table lock when we were expecting it to be concurrent >>>> (because, for example, the index got rebuilt and someone forgot the >>>> UNIQUE) violates POLA from my perspective. >>> >>> I would not think that an approach which took a full table lock to >>> implement the more general case would be accepted. >> >> Why not? There are certainly cases ... like bulk loading ... where >> users would find it completely acceptable. Imagine that you're merging >> 3 files into a single unlogged table before processing them into >> finished data. > > So the expectation is that when we implement MERGE it will, by > default, take out an EXCLUSIVE lock for the entire target table for > the entire duration of the command? I would have expected a bit > more finesse. I don't know that that is the *expectation*. However, I personally would find it *acceptable* if it meant that we could get efficient merge semantics on other aspects of the syntax, since my primary use for MERGE is bulk loading. Regardless, I don't think there's any theoretical way to support UPSERT without a unique constraint. Therefore eventual support of this would require a full table lock. Therefore having it use the same command as UPSERT with a unique constraint is a bit of a booby trap for users. This is a lot like the "ADD COLUMN with a default rewrites the whole table" booby trap which hundreds of our users complain about every month. We don't want to add more such unexpected consequences for users. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2014-09-30 14:51:57 -0700, Kevin Grittner wrote: > Josh Berkus <josh@agliodbs.com> wrote: > > On 09/30/2014 02:39 PM, Kevin Grittner wrote: > >> Josh Berkus <josh@agliodbs.com> wrote: > >>> On 09/30/2014 07:15 AM, Kevin Grittner wrote: > >>> > >>>> At the risk of pushing people away from this POV, I'll point out > >>>> that this is somewhat similar to what we do for unlogged bulk loads > >>>> -- if all the conditions for doing it the fast way are present, we > >>>> do it the fast way; otherwise it still works, but slower. > >>> > >>> Except that switching between fast/slow bulk loads affects *only* the > >>> speed of loading, not the locking rules. Having a statement silently > >>> take a full table lock when we were expecting it to be concurrent > >>> (because, for example, the index got rebuilt and someone forgot the > >>> UNIQUE) violates POLA from my perspective. > >> > >> I would not think that an approach which took a full table lock to > >> implement the more general case would be accepted. > > > > Why not? There are certainly cases ... like bulk loading ... where > > users would find it completely acceptable. Imagine that you're merging > > 3 files into a single unlogged table before processing them into > > finished data. > > So the expectation is that when we implement MERGE it will, by > default, take out an EXCLUSIVE lock for the entire target table for > the entire duration of the command? I would have expected a bit > more finesse. I think it'd be acceptable. Alternatively we'll just accept that you can get uniqueness violations under concurrency. I many cases that'll be fine. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 30, 2014 at 3:01 PM, Andres Freund <andres@2ndquadrant.com> wrote: > I think it'd be acceptable. Alternatively we'll just accept that you can > get uniqueness violations under concurrency. I many cases that'll be > fine. I think living with unique violations is the right thing with MERGE, fwiw. -- Peter Geoghegan
On 2014-09-30 14:57:43 -0700, Josh Berkus wrote: > Regardless, I don't think there's any theoretical way to support UPSERT > without a unique constraint. You can do stuff like blocking predicate locking. But without indexes to support it that gets awfully complicated and unfunny. I don't think we want to go there. So essentially I agree with that statement. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-09-30 14:57:43 -0700, Josh Berkus wrote: > >> Regardless, I don't think there's any theoretical way to support >> UPSERT without a unique constraint. > > You can do stuff like blocking predicate locking. But without indexes to > support it that gets awfully complicated and unfunny. I don't think we > want to go there. So essentially I agree with that statement. Well, as you seem to be saying, it's not to bad with even an non-unique index if we wanted to do a little extra work; and there are a lot of ways to potentially deal with it even without that. Theoretically, the number of ways to do this is limited only by time available to brainstorm. That said, at no time have I advocated that we try to implement UPSERT in this release with anything but a UNIQUE index. The issue I raised was whether a subset of the MERGE syntax should be used to specify UPSERT rather than inventing our own syntax -- which doesn't seem in any way incompatible requiring a unique index to match the expression. Given subsequent discussion, perhaps we could decorate it with something to indicate which manner of concurrency handling is desired? Techniques discussed so far are - UPSERT style- Hold an EXCLUSIVE lock on the table- Allow "native" concurrency management An alternative which seems to be on some people's minds is to use a different command name for the first option (but why not keep the rest of the standard syntax?) and to require an explicit LOCK TABLE statement at the start of the transaction if you want the second option. My preference, after this discussion, would be to default to UPSERT style if the appropriate conditions are met, and to default to the third option otherwise. If you want an exclusive lock, ask for it with the LOCK TABLE statement. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Sep 30, 2014 at 2:15 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Well. People have given you outlines of approaches. And Heikki even gave > you a somewhat working prototype. I don't think you can fairly expect > more. I don't expect anything, really. I asked nicely - that's all. I don't know why there is so much discussion of what I expect or don't expect. Things don't work around here by everyone doing only strictly what they're obligated to do. Everyone is strictly obligated to do nothing, when you get right down to it. >> Even still, I would appreciate it if people could >> review the patch on the assumption that those issues will be worked >> out. > > Right now I don't really see the point. You've so far shown no > inclination to accept significant concerns about your approach. And > without an agreement about how to solve the concurrency issues the > feature is dead in the water. And thus time spent reviewing isn't well > spent. > > I'm pretty sure I'm not the only one feeling that way at this point. I think that's *incredibly* unfair. There appears to be broad acceptance of the problems around deadlocking as a result of my work with Heikki. That was a major step forward. Now we all agree on the parameters of the discussion around value locking, AFAICT. There is an actual way forward, and not total quagmire -- great. I had to dig my heals in to win that much, and it wasn't easy. I accept that it probably wasn't easy for other people either, and I am thankful for the effort of other people, particularly Heikki, but also you. >> A couple of weeks >> ago, I felt good about the fact that it seemed "time was on my side" >> 9.5-wise, but maybe that isn't true. Working through the community >> process for this patch is going to be very difficult. > > The community process involves accepting that your opinion isn't the > community's. Believe me, I learned that the hard way. The community doesn't have a worked-out opinion on this either way. Arguably, what you and Simon want to do is closer than what I want to do than what Heikki wants to do - you're still talking about adding locks that are tied to AMs in a fairly fundamental way. But, FWIW, I'd sooner take Heikki's approach than insert promise tuples into indexes directly. I think that Heikki's approach is better. In all honesty, I don't care who "wins", as long as someone does and we get the feature in shape. No one can "win" if all sides are not realistic about the problems. The issues that I've called out about what Heikki has suggested are quite significant issues. Can't we talk about them? Or am I required to polish-up Heikki's approach, and present it at a commitfest, only to have somebody point out the same issues then? I am *not* nitpicking, and the issues are of fundamental importance. Look at the issues I raise and you'll see that's the case. My pointing out of these issues is not some artifice to "win" the argument. I don't appreciate the insinuation that it is. I am completely undeserving of that sort of mistrust. It's insulting. And it's also a total misrepresentation to suggest it's me versus you, Heikki, Robert, and Simon. Opinion is far more divided than you let on, since what you and Simon suggest is far different to what Heikki suggests. Let's figure out a way to reach agreement. > It's one thing to argue about the implementation of a feature for a week > or four. Or even insist that you're right in some implementation detail > local to your new code. But you've not moved one jota in the critical > parts that affect large parts of the system in half a year. You're right. I haven't moved one bit on that. But, on the other hand, I haven't doubled down on the approach either - I have done very little on it, and have given it relatively little thought either way. I preferred to focus my energies on the "top half". Surely you'd agree that that was the logical course of action to take over the last few months. I don't know if you noticed, but I presented this whole new revised version as "this is the thing that gives us the ability to discuss the fundamental issue of value locking". So my suggestion was that if you don't want to have that conversation, at least look at the "top half" a bit. -- Peter Geoghegan
On 30 September 2014 19:49, Josh Berkus <josh@agliodbs.com> wrote: > On 09/30/2014 11:20 AM, Peter Geoghegan wrote: >>> > For example, this patch for UPSERT doesn't support updatable views. >>> > But I can't see anyone that didn't read the patch would know that. >> By reading the CREATE VIEW docs. Maybe there could stand to be a >> compatibility note in the main INSERT command, but I didn't want to do >> that as long as things were up in the air. It might be the case that >> we figure out good behavior for updatable views. > > All of these things sound like good ideas for documentation > improvements, but hardly anything which should block the patch. It has > documentation, more than we'd require for a lot of other patches, and > it's not like the 9.5 release is next month. We won't get consensus simply by saying "Would you like a fast upsert feature?" because everyone says Yes to that. A clear description of the feature being added is necessary to agree its acceptance. When we implement a SQL Standard feature, we can just look in the standard to see how it should work and compare. When we go off-piste, we need more info to make sure we know what we are getting as well as why we are not getting something from the Standard. I have not suggested I would block the patch because it doesn't have docs. I have pointed out that the lack of consensus about the patch is because nobody knows what it contains, which others agreed with. My request was, and is, a proposed mechanism to *unblock* a very obviously stalled patch. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 30, 2014 at 4:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > A clear description of the feature being added is necessary to agree > its acceptance. When we implement a SQL Standard feature, we can just > look in the standard to see how it should work and compare. When we go > off-piste, we need more info to make sure we know what we are getting > as well as why we are not getting something from the Standard. I think that's fair. > I have not suggested I would block the patch because it doesn't have > docs. I have pointed out that the lack of consensus about the patch is > because nobody knows what it contains, which others agreed with. My > request was, and is, a proposed mechanism to *unblock* a very > obviously stalled patch. Please keep asking questions - it isn't necessarily obvious to me *what* isn't clear, because of my lack of perspective. That's a useful role. It occurs to me now that I ought to have found a place to document "cardinality violations" [1], but I didn't, for example. [1] http://tracker.firebirdsql.org/browse/CORE-2274 -- Peter Geoghegan
On 2014-09-26 16:19:33 -0700, Peter Geoghegan wrote: > On Fri, Sep 26, 2014 at 3:25 PM, Peter Geoghegan <pg@heroku.com> wrote: > > On Fri, Sep 26, 2014 at 3:11 PM, Alvaro Herrera > > <alvherre@2ndquadrant.com> wrote: > >> FWIW there are 28 callers of HeapTupleHeaderGetXmin. > > > Don't forget about direct callers to HeapTupleHeaderGetRawXmin(), > > though. There are plenty of those in tqual.c. > > Which reminds me: commit 37484ad2 added the opportunistic freezing > stuff. To quote the commit message: > > """ > Instead of changing the tuple xmin to FrozenTransactionId, the combination > of HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID, which were previously never > set together, is now defined as HEAP_XMIN_FROZEN. A variety of previous > proposals to freeze tuples opportunistically before vacuum_freeze_min_age > is reached have foundered on the objection that replacing xmin by > FrozenTransactionId might hinder debugging efforts when things in this > area go awry; this patch is intended to solve that problem by keeping > the XID around (but largely ignoring the value to which it is set). > > """ > > Why wouldn't the same objection (the objection that the earlier > opportunistic freezing ideas stalled on) apply to directly setting > tuple xmin to InvalidTransactionId? Because it's pretty much unrelated? The FrozenTransactionId bit you reference is about tuples that actually survive, which isn't the case here. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 30, 2014 at 02:57:43PM -0700, Josh Berkus wrote: > I don't know that that is the *expectation*. However, I personally > would find it *acceptable* if it meant that we could get efficient merge > semantics on other aspects of the syntax, since my primary use for MERGE > is bulk loading. > > Regardless, I don't think there's any theoretical way to support UPSERT > without a unique constraint. Therefore eventual support of this would > require a full table lock. Therefore having it use the same command as > UPSERT with a unique constraint is a bit of a booby trap for users. > This is a lot like the "ADD COLUMN with a default rewrites the whole > table" booby trap which hundreds of our users complain about every > month. We don't want to add more such unexpected consequences for users. I think if we use the MERGE command for this feature we would need to use a non-standard keyword to specify that we want OLTP/UPSERT functionality. That would allow us to mostly use the MERGE standard syntax without having surprises about non-standard behavior. I am thinking of how CONCURRENTLY changes the behavior of some commands. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Thu, Oct 2, 2014 at 1:10 PM, Bruce Momjian <bruce@momjian.us> wrote: > I think if we use the MERGE command for this feature we would need to > use a non-standard keyword to specify that we want OLTP/UPSERT > functionality. That would allow us to mostly use the MERGE standard > syntax without having surprises about non-standard behavior. I am > thinking of how CONCURRENTLY changes the behavior of some commands. That would leave you without a real general syntax. It'd also make having certain aspects of an UPSERT more explicit be a harder goal (there is no conventional join involved here - everything goes through a unique index). Adding the magic keyword would break certain other parts of the statement, so you'd have exact rules for what worked where. I see no advantage, and considerable disadvantages. Note that I've documented a lot of this stuff here: https://wiki.postgresql.org/wiki/UPSERT Mapping the join thing onto which unique index you want to make the UPSERT target is very messy. There are a lot of corner cases. It's quite ticklish. Please add to it if you think we've missed something. -- Peter Geoghegan
On Thu, Oct 2, 2014 at 02:08:30PM -0700, Peter Geoghegan wrote: > On Thu, Oct 2, 2014 at 1:10 PM, Bruce Momjian <bruce@momjian.us> wrote: > > I think if we use the MERGE command for this feature we would need to > > use a non-standard keyword to specify that we want OLTP/UPSERT > > functionality. That would allow us to mostly use the MERGE standard > > syntax without having surprises about non-standard behavior. I am > > thinking of how CONCURRENTLY changes the behavior of some commands. > > That would leave you without a real general syntax. It'd also make > having certain aspects of an UPSERT more explicit be a harder goal > (there is no conventional join involved here - everything goes through > a unique index). Adding the magic keyword would break certain other > parts of the statement, so you'd have exact rules for what worked > where. I see no advantage, and considerable disadvantages. > > Note that I've documented a lot of this stuff here: > > https://wiki.postgresql.org/wiki/UPSERT > > Mapping the join thing onto which unique index you want to make the > UPSERT target is very messy. There are a lot of corner cases. It's > quite ticklish. > > Please add to it if you think we've missed something. OK, it is was just an idea I wanted to point out, and if it doesn't work, it more clearly cements that we need UPSERT _and_ MERGE. Josh was pointing out that we don't want to surprise our users, so I suggested an additional keyword, which addresses his objections, but as you said, if that standard MERGE syntax doesn't give us what we want, then that is the fatal objection to using only MERGE. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Thu, Sep 4, 2014 at 12:13 AM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Sep 3, 2014 at 9:51 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN >>> upsert_pkey UPDATE SET val = 'update'; >> >> It seems to me that it would be better to specify a conflicting column >> set rather than a conflicting index name. > > I'm open to pursuing that, provided there is a possible implementation > that's robust against things like BEFORE triggers that modify > constrained attributes. It must also work well with partial unique > indexes. So I imagine we'd have to determine a way of looking up the > unique index only after BEFORE triggers fire. Unless you're > comfortable with punting on some of these cases by throwing an error, > then all of this is actually surprisingly ticklish. Speaking of this, I really don't like the proposed behavior of firing BEFORE INSERT triggers even before we've decided whether to insert or update. In the "classical" upsert pattern, changes by a BEFORE INSERT trigger would get rolled back on conflict, but the new approach seems surprising: changes from BEFORE INSERT get persisted in the database, but AFTER INSERT is not fired. I haven't found any discussion about alternative triggers semantics for upsert. If there has been any, can you point me to it? ---- How about this: use the original VALUES results for acquiring a value lock; if indeed the row didn't conflict, *then* fire BEFORE INSERT triggers, and throw an error if the trigger changed any columns of the (specified?) unique key. Advantages of this approach: 1. Would solve the above conundrum about specifying a unique index via columns. 2. In the UPDATE case we can skip evaluating INSERT triggers and DEFAULT expressions for columns 3. If I'm not missing anything, this approach may also let us get rid of the CONFLICTING() construct 4. Possibly be closer to MySQL's syntax? Point (2) is actually a major consideration IMO: if your query is mostly performing UPDATEs, on a table with SERIAL keys, and you're using a different key to perform the updates, then you waste sequence values unnecessarily. I believe this is a very common pattern, for example: create table evt_type (id serial primary key, name text unique, evt_count int); prepare upsert(text) as INSERT into evt_type (name, evt_count) values ($1, 1) on conflict within evt_type_name_key UPDATE set evt_count=evt_count+1; execute upsert('foo'); execute upsert('foo'); execute upsert('bar'); # table evt_type;id | name | evt_count ----+------+----------- 1 | foo | 2 3 | bar | 1 <-- id could very well be "2" Regards, Marti
On 2 October 2014 22:37, Bruce Momjian <bruce@momjian.us> wrote: > OK, it is was just an idea I wanted to point out, and if it doesn't > work, it more clearly cements that we need UPSERT _and_ MERGE. It seems clear that having two different initial keywords is popular because it provides clarity about which aspects of the commands will be supported. I like the idea of making the two commands as close as possible in syntax, which will make it easier to program for and encourage adoption. The command name could easily be MERGE [CONCURRENTLY] since that uses the same concept from earlier DDL syntax/keywords. In UPSERT, we don't need the ON keyword at all. If we are altering the syntax, then we can easily remove this. IIRC it wasn't agreed that we needed to identify which indexes in the upsert SQL statement itself, since this would be possible in other ways and would require programmers to know which unique constraints are declared. All of the other syntax could easily remain the same, leaving us with a command that looks like this... MERGE CONCURRENTLY INTO foo USING VALUES () WHEN NOT MATCHED THEN INSERT WHEN MATCHED THENUPDATE Since MERGE now supports DELETE and IGNORE as options, presumably we would also want to support those for the UPSERT version also. I think it would be useful to also support a mechanism for raising an error, as DB2 allows. More complex example of MERGE MERGE INTO product AS T USING (SELECT sales.id, sum(sold) AS sold, max(catalog.name) as name FROM sales, catalog WHEREsales.id = catalog.id GROUP BY sales.id) AS S ON S.id = T.id WHEN MATCHED AND T.inventory = S.sold THEN DELETE WHEN MATCHED AND T.inventory < S.sold THEN SIGNAL SQLSTATE '78000' SET MESSAGE_TEXT = 'Oversold: ' || S.name WHEN MATCHED THEN UPDATE SET inventory = T.inventory - S.sold WHEN NOT MATCHED THEN INSERT VALUES(S.id, S.name, -S.sold); Full example would be similar to this MERGE CONCURRENTLY INTO product AS T USING VALUES () WHEN MATCHED AND T.inventory = S.sold THEN DELETE WHENMATCHED AND T.inventory < S.sold THEN SIGNAL SQLSTATE '78000' SET MESSAGE_TEXT = 'Oversold: ' || S.name WHEN MATCHED THEN UPDATE SET inventory = T.inventory - S.sold WHEN NOT MATCHED THEN INSERT VALUES(S.id, S.name, -S.sold); -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Oct 7, 2014 at 5:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > IIRC it wasn't agreed that we needed to identify which indexes in the > upsert SQL statement itself, since this would be possible in other > ways and would require programmers to know which unique constraints > are declared. Kevin seemed quite concerned about that. That is something that seems hard to reconcile with supporting the MERGE syntax. Perhaps Kevin can comment on that, since he was in favor of both being able to specify user intent by accepting a unique index, while also being in favor of the MERGE syntax. > All of the other syntax could easily remain the same, leaving us with > a command that looks like this... > > MERGE CONCURRENTLY INTO foo USING VALUES () > WHEN NOT MATCHED THEN > INSERT > WHEN MATCHED THEN > UPDATE > > Since MERGE now supports DELETE and IGNORE as options, presumably we > would also want to support those for the UPSERT version also. > I think it would be useful to also support a mechanism for raising an > error, as DB2 allows. It seems like what you're talking about here is just changing the spelling of what I already have. I think that would be confusing to users when the time comes to actually implement a fully-generalized MERGE, even with the clearly distinct MERGE CONCURRENTLY variant outlined here (which, of course, lacks an outer join, unlike MERGE proper). However, unlike the idea of trying to square the circle of producing a general purpose MERGE command that also supports the UPSERT use-case, my objection to this much more limited proposal is made purely on aesthetic grounds. I think that it is not very user-friendly; I do not think that it's a total disaster, which is what trying to solve both problems at once (MERGE bulkloading and UPSERTing) would result in. So FWIW, if the community is really set on something that includes the keyword MERGE, which is really all you outline here, then I can live with that. -- Peter Geoghegan
On Wed, Oct 8, 2014 at 3:47 AM, Peter Geoghegan <pg@heroku.com> wrote: > It seems like what you're talking about here is just changing the > spelling of what I already have. I think there's a subtle difference in expectations too. The current BEFORE INSERT trigger behavior is somewhat defensible with an INSERT-driven syntax (though I don't like it even now [1]). But the MERGE syntax, to me, strongly implies that insertion doesn't begin before determining whether a conflict exists or not. [1] http://www.postgresql.org/message-id/CABRT9RD6zriK+t6mnqQOqaozZ5z1bUaKh+kNY=O9ZqBZFoAuBg@mail.gmail.com Regards, Marti
On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp <marti@juffo.org> wrote: > I think there's a subtle difference in expectations too. The current > BEFORE INSERT trigger behavior is somewhat defensible with an > INSERT-driven syntax (though I don't like it even now [1]). There is no way around it. We need to fire before row triggers to know what to insert on the one hand, but on the other hand (in general) we have zero ability to nullify the effects (or side-effects) of before triggers, since they may execute arbitrary user-defined code. I think there is a good case to be made for severely restricting what before row triggers can do, but it's too late for that. > But the > MERGE syntax, to me, strongly implies that insertion doesn't begin > before determining whether a conflict exists or not. I think you're right. Another strike against the MERGE syntax, then, since as I said we cannot even know what to check prior to having before row insert triggers fire. -- Peter Geoghegan
On Tue, Oct 7, 2014 at 2:27 PM, Marti Raudsepp <marti@juffo.org> wrote: > but the new approach seems > surprising: changes from BEFORE INSERT get persisted in the database, > but AFTER INSERT is not fired. I am sorry, I realize now that I misunderstood the current proposed trigger behavior, I thought what Simon Riggs wrote here already happens: https://groups.google.com/forum/#!msg/django-developers/hdzkoLYVjBY/bnXyBVqx95EJ But the point still stands: firing INSERT triggers when the UPDATE path is taken is counterintuitive. If we prevent changes of upsert key columns in BEFORE triggers then we get the benefits, including more straightforward trigger behavior and avoid problems with serial columns. Regards, Marti
On Wed, Oct 8, 2014 at 12:28 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp <marti@juffo.org> wrote: >> I think there's a subtle difference in expectations too. The current >> BEFORE INSERT trigger behavior is somewhat defensible with an >> INSERT-driven syntax (though I don't like it even now [1]). > > There is no way around it. We need to fire before row triggers to know > what to insert on the one hand, but on the other hand (in general) we > have zero ability to nullify the effects (or side-effects) of before > triggers, since they may execute arbitrary user-defined code. With my proposal this problem disappears: if we prevent BEFORE triggers from changing key attributes of NEW in the case of upsert, then we can acquire value locks before firing any triggers (before even constructing the whole tuple), and have a guarantee that the value locks are still valid by the time we proceed with the actual insert/update. Other than changing NEW, the side effects of triggers are not relevant. Now, there may very well be reasons why this is tricky to implement, but I haven't heard any. Can you see any concrete reasons why this won't work? I can take a shot at implementing this, if you're willing to consider it. > I think > there is a good case to be made for severely restricting what before > row triggers can do, but it's too late for that. There are no users of new "upsert" syntax out there yet, so it's not too late to rehash the semantics of that. This in no way affects users of old INSERT/UPDATE syntax. Regards, Marti
On 8 October 2014 01:47, Peter Geoghegan <pg@heroku.com> wrote: > It seems like what you're talking about here is just changing the > spelling of what I already have. I think that would be confusing to > users when the time comes to actually implement a fully-generalized > MERGE, even with the clearly distinct MERGE CONCURRENTLY variant > outlined here (which, of course, lacks an outer join, unlike MERGE > proper). I change my view on this, after some more thought. (Hope that helps) If we implement MERGE, I can see we may also wish to implement MERGE CONCURRENTLY one day. That would be different to UPSERT. So in the future I think we will need 3 commands 1. MERGE 2. MERGE CONCURRENTLY 3. UPSERT So we no longer need to have the command start with the MERGE keyword. > However, unlike the idea of trying to square the circle of producing a > general purpose MERGE command that also supports the UPSERT use-case, > my objection to this much more limited proposal is made purely on > aesthetic grounds. I think that it is not very user-friendly; I do not > think that it's a total disaster, which is what trying to solve both > problems at once (MERGE bulkloading and UPSERTing) would result in. So > FWIW, if the community is really set on something that includes the > keyword MERGE, which is really all you outline here, then I can live > with that. We will one day have MERGE according to the SQL Standard. My opinion is that syntax for this should be similar to MERGE in the *body* of the command, rather than some completely different syntax. e.g. > WHEN NOT MATCHED THEN > INSERT > WHEN MATCHED THEN > UPDATE I'm happy that we put that to a vote on what the syntax should be, as long as we bear in mind that we will one day have MERGE as well. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Sep 29, 2014 at 7:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Having said that, it would be much nicer to have a mode that allows > you to just say the word "UPDATE" and have it copy the data into the > correct columns, like MySQL does. That is very intuitive, even if it > isn't very flexible. I thought about this, and at first I agreed, but now I'm not so sure. Consider the case where you write an INSERT ... ON CONFLICT UPDATE ALL query, or however we might spell this idea. 1. Developer writes the query, and it works fine. 2. Some time later, the DBA adds an inserted_at column (those are common). The DBA is not aware of the existence of this particular query. The new column has a default value of now(), say. Should we UPDATE the inserted_at column to be NULL? Or (more plausibly) the default value filled in by the INSERT? Or leave it be? I think there is a case to be made for all of these behaviors, and that tension makes me prefer to not do this at all. It's like encouraging "SELECT *" queries in production, only worse. -- Peter Geoghegan
On Wed, Oct 8, 2014 at 12:12 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Sep 29, 2014 at 7:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Having said that, it would be much nicer to have a mode that allows >> you to just say the word "UPDATE" and have it copy the data into the >> correct columns, like MySQL does. That is very intuitive, even if it >> isn't very flexible. > > I thought about this, and at first I agreed, but now I'm not so sure. Actually, I don't think MySQL supports this. It doesn't allow INSERT ON DUPLICATE KEY UPDATE to do it, AFAICT. Their REPLACE syntax supports that, but that's a feature that is quite distinct to what I have in mind here. -- Peter Geoghegan
On Wed, Oct 8, 2014 at 6:25 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > I change my view on this, after some more thought. (Hope that helps) Great. > If we implement MERGE, I can see we may also wish to implement MERGE > CONCURRENTLY one day. That would be different to UPSERT. > > So in the future I think we will need 3 commands > > 1. MERGE > 2. MERGE CONCURRENTLY > 3. UPSERT > > So we no longer need to have the command start with the MERGE keyword. As I've outlined, I don't see how MERGE CONCURRENTLY could ever work, but I'm glad that you agree that it should not block this effort (or indeed, some later effort to implement a MERGE that is comparable to the implementations of other database systems). > We will one day have MERGE according to the SQL Standard. Agreed. > My opinion is that syntax for this should be similar to MERGE in the > *body* of the command, rather than some completely different syntax. > e.g. > >> WHEN NOT MATCHED THEN >> INSERT >> WHEN MATCHED THEN >> UPDATE > > I'm happy that we put that to a vote on what the syntax should be, as > long as we bear in mind that we will one day have MERGE as well. While I am also happy with taking a vote, if we do so I vote against even this much less MERGE-like syntax. It's verbose, and makes much less sense when the mechanism is driven by would-be duplicate key violations rather than an outer join. I also like that when you UPSERT with the proposed ON CONFLICT UPDATE syntax, you get all the flexibility of an INSERT - you can use data-modifying CTEs, and nest the statement in a data-modifying CTE, and "INSERT ... SELECT... ON CONFLICT UPDATE ..." . And to be honest, it's much simpler to implement this whole feature as an adjunct to how INSERT statements are currently processed (during parse analysis, planning and execution); I don't want to make the syntax work against that. For example, consider how little I had to change the grammar to make all of this work: $ git diff master --stat | grep gramsrc/backend/parser/gram.y | 72 ++- The code footprint of this patch is relatively small, and I think we can all agree that that's a good thing. -- Peter Geoghegan
On 08/10/14 11:28, Peter Geoghegan wrote: > On Wed, Oct 8, 2014 at 1:36 AM, Marti Raudsepp <marti@juffo.org> wrote: >> But the >> MERGE syntax, to me, strongly implies that insertion doesn't begin >> before determining whether a conflict exists or not. > > I think you're right. Another strike against the MERGE syntax, then, > since as I said we cannot even know what to check prior to having > before row insert triggers fire. > True, but to me it also seems to be strike against using INSERT for this as I don't really see how you can make triggers work in a sane way if the UPSERT is implemented as part of INSERT (at least I haven't seen any proposal that I would consider sane from the user point of view). -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Oct 7, 2014 at 5:23 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> IIRC it wasn't agreed that we needed to identify which indexes in the >> upsert SQL statement itself, since this would be possible in other >> ways and would require programmers to know which unique constraints >> are declared. > > Kevin seemed quite concerned about that. That is something that seems > hard to reconcile with supporting the MERGE syntax. Perhaps Kevin can > comment on that, since he was in favor of both being able to specify > user intent by accepting a unique index, while also being in favor of > the MERGE syntax. Well, I mostly wanted to make sure we properly considered what the implications were of using the standard syntax without other keywords or decorations before deciding to go the non-standard route. In spite of an alarming tendency for people to assume that meant that I didn't understand the desired semantics, I feel enough people have understood the question and weighed in in favor of an explicit choice between semantics, rather than inferring concurrency handling based on the availability of the index necessary for the slicker behavior. I'm willing to concede that overall consensus is leaning toward the view that UPSERT semantics should be conditioned on explicit syntax; I'll drop that much going forward. Granting that, I will say that I lean toward either the MERGE syntax with CONCURRENTLY being the flag to use UPSERT semantics, or a separate UPSERT command which is as close to identical to the MERGE syntax (other than the opening verb) as possible. I see that as still needing the ON clause so that you can specify which values match which columns from the target table. I'm fine with starting with the syntax in the standard, which has no DELETE or IGNORE options (as of the latest version I've seen). So the syntax I'm suggesting is close to what Simon is suggesting, but a more compliant form would be: MERGE CONCURRENTLY INTO foo USING (VALUES (valuelist) aliases) ON (conditions) WHEN NOT MATCHED INSERT [ (columnlist) ]VALUES (valuelist) WHEN MATCHED UPDATE SET colname = expression [, ...] Rather than pseudo-randomly picking a unique index or using a constraint or index name, the ON condition would need to allow matching based on equality to all columns of a unique index which only referenced NOT NULL columns; we would pick an index which matched those conditions. In any event, the unique index would be required if CONCURRENTLY was specified. Using column matching to pick the index (like we do when specifying a FOREIGN KEY constraint) is more in keeping with other SQL statements, and seems generally safer to me. It would also make it fairly painless for people to switch concurrency techniques for what is, after all, exactly the same operation except for differences in handling of concurrent conflicting DML. As I said, I'm also OK with using UPSERT in place of MERGE CONCURRENTLY. I also feel that if we could allow: USING (VALUES (valuelist) [, ...]) that would be great. In fact, I don't see why that can't be pretty much any relation, but it doesn't have to be for a first cut. A relation would allow a temporary table to be loaded with a batch of rows where the intent is to UPSERT every row in the batch, without needing to write a loop to do it. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8 October 2014 21:16, Peter Geoghegan <pg@heroku.com> wrote: >> My opinion is that syntax for this should be similar to MERGE in the >> *body* of the command, rather than some completely different syntax. >> e.g. >> >>> WHEN NOT MATCHED THEN >>> INSERT >>> WHEN MATCHED THEN >>> UPDATE >> >> I'm happy that we put that to a vote on what the syntax should be, as >> long as we bear in mind that we will one day have MERGE as well. > > While I am also happy with taking a vote, if we do so I vote against > even this much less MERGE-like syntax. It's verbose, and makes much > less sense when the mechanism is driven by would-be duplicate key > violations rather than an outer join. It wouldn't be driven by an outer join, not sure where that comes from. MERGE is verbose, I agree. I don't always like the SQL Standard, I just think we should follow it as much as possible. You can't change the fact that MERGE exists, so I don't see a reason to have two variants of syntax that do roughly the same thing. MERGE syntax would allow many things, such as this... WHEN NOT MATCHED AND x > 7 THEN INSERT WHEN NOT MATCHED THEN INSERT WHEN MATCHED AND y = 5 THEN DO NOTHING WHEN MATCHED THENUPDATE etc > I also like that when you UPSERT > with the proposed ON CONFLICT UPDATE syntax, you get all the > flexibility of an INSERT - you can use data-modifying CTEs, and nest > the statement in a data-modifying CTE, and "INSERT ... SELECT... ON > CONFLICT UPDATE ..." . And to be honest, it's much simpler to > implement this whole feature as an adjunct to how INSERT statements > are currently processed (during parse analysis, planning and > execution); I don't want to make the syntax work against that. I spoke to someone today that preferred a new command keyword, like UPSERT, because the semantics of triggers are weird. Having a before insert trigger fire when there is no insert is quite strange. Properly documenting that on hackers would help; has the comments made on the Django list been replayed here in some form? I'm very scared by your comments about data modifying CTEs etc.. You have no definition of how they will work, not tests of that. That part isn't looking like a benefit as things currently stand. I'm still waiting for some more docs to describe your intentions so they can be reviewed. Also, it would be useful to hear that your're going to do something about the references to rows using conflicting(), since nobody has agreed with you there. Or hopefully even that you've listened and implemented something differently already. (We need that, whatever we do with other elements of syntax). Overall, I'm not seeing too many comments that indicate you are accepting review comments and acting upon them. If you want acceptance from others, you need to begin with some yourself. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 8, 2014 at 2:04 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> While I am also happy with taking a vote, if we do so I vote against >> even this much less MERGE-like syntax. It's verbose, and makes much >> less sense when the mechanism is driven by would-be duplicate key >> violations rather than an outer join. > > It wouldn't be driven by an outer join, not sure where that comes from. Right, I understood that it wouldn't be - which is the point. So with an UPSERT that retains influence from MERGE, NOT MATCHED means "no conflict", MATCHED means "conflict". That just seems like an odd way to spell the concept given, as you say, that we're not talking about an outer join. > MERGE is verbose, I agree. I don't always like the SQL Standard, I > just think we should follow it as much as possible. You can't change > the fact that MERGE exists, so I don't see a reason to have two > variants of syntax that do roughly the same thing. > > MERGE syntax would allow many things, such as this... > WHEN NOT MATCHED AND x > 7 THEN > INSERT > WHEN NOT MATCHED THEN > INSERT > WHEN MATCHED AND y = 5 THEN > DO NOTHING > WHEN MATCHED THEN > UPDATE > > etc But then you can have before row insert triggers fire, which as you acknowledge is more surprising with this syntax. > I spoke to someone today that preferred a new command keyword, like > UPSERT, because the semantics of triggers are weird. Having a before > insert trigger fire when there is no insert is quite strange. Properly > documenting that on hackers would help; has the comments made on the > Django list been replayed here in some form? Yes. It's also mentioned in the commit message of CONFLICTING() (patch 0003-*). And the documentation (both the proposed INSERT documentation, and the trigger documentation). There is a large comment on it in the code. So I've said it many times. > I'm very scared by your comments about data modifying CTEs etc.. You > have no definition of how they will work, not tests of that. That part > isn't looking like a benefit as things currently stand. Actually, I have a few smoke tests for that. But I don't see any need for special handling. When you have a data-modifying CTE, it can contain an INSERT, and there are no special restrictions on that INSERT (other than that it may not itself have a CTE, but that's true more generally). You can have data-modifying CTEs containing INSERTs, and data-modifying CTEs containing UPDATEs....what I've done is have data-modifying CTEs contain INSERTs that also happen to have an ON CONFLICT UPDATE clause. This new clause of INSERTs is in no more need of special documentation regarding interactions with data-modifying CTEs than UPDATE .... WHERE CURRENT OF is. The only possible exception I can think of would be cardinality violations where a vanilla INSERT in one part of a command (one data-modifying CTE) gives problems to the "UPSERT part" of the same command (because we give a special cardinality violation message when we try to update the same tuple twice in the same command). But that's a pretty imaginative complaint, and I doubt it would really surprise someone. Why would you be surprised by the fact that a new clause for INSERT plays nicely with existing clauses? It's nothing special - there is no special handling. > I'm still waiting for some more docs to describe your intentions so > they can be reviewed. I think it would be useful to add several more isolation tests, highlighting some of the cases you talked about. I'll work on that. While the way forward for WITHIN isn't clear, I think a WITHIN PRIMARY KEY variant would certainly be useful. Maybe it would be okay to forget about naming a specific unique index, while supporting an (optional) WITHIN PRIMARY KEY/NOT WITHIN PRIMARY KEY. It doesn't totally solve the problems, but may be a good compromise that mostly satisfies people that want to be able to clearly indicate user intent (Kevin, in particular), and satisfies other people that don't want to name a unique index (Heikki, in particular). Certainly, the Django people would like that, since they said as much. > Also, it would be useful to hear that your're going to do something > about the references to rows using conflicting(), since nobody has > agreed with you there. Or hopefully even that you've listened and > implemented something differently already. (We need that, whatever we > do with other elements of syntax). Do you really expect me to do major work on some aspect of the syntax like that, given, as you say, that nobody explicitly agreed with me (and only you disagreed with me)? The only remark I heard on that was from you (you'd prefer to use NEW.* and OLD.*). But you spent much more time talking about getting something MERGE-like, which NEW.*/OLD.* clearly isn't. CONFLICTING() is very close (identical?) to MySQL's use of "ON DUPLICATE KEY UPDATE val = VALUES(val)". I'm happy to discuss that, but it's news to me that people take particular issue with it. > Overall, I'm not seeing too many comments that indicate you are > accepting review comments and acting upon them. If you want acceptance > from others, you need to begin with some yourself. What, specifically, have I failed to act on? We are discussing the syntax here. I have very valid practical reasons for wanting to make this feature a clause of INSERT. That is a view that Andres seemed to agree with [1], for example. [1] http://www.postgresql.org/message-id/20140929070235.GP1169@alap3.anarazel.de -- Peter Geoghegan
On 8 October 2014 23:24, Peter Geoghegan <pg@heroku.com> wrote: >> Also, it would be useful to hear that your're going to do something >> about the references to rows using conflicting(), since nobody has >> agreed with you there. Or hopefully even that you've listened and >> implemented something differently already. (We need that, whatever we >> do with other elements of syntax). > > Do you really expect me to do major work on some aspect of the syntax > like that, given, as you say, that nobody explicitly agreed with me > (and only you disagreed with me)? The only remark I heard on that was > from you (you'd prefer to use NEW.* and OLD.*). > But you spent much > more time talking about getting something MERGE-like, which > NEW.*/OLD.* clearly isn't. Yes, it is. Look at the AS clause. > CONFLICTING() is very close (identical?) to MySQL's use of "ON > DUPLICATE KEY UPDATE val = VALUES(val)". I'm happy to discuss that, > but it's news to me that people take particular issue with it. 3 people have asked you questions or commented about the use of CONFLICTING() while I've been watching. It's clearly a non-standard mechanism and not inline with other Postgres usage. Nobody actually says "I object to this" - do they need to use that phrase before you take note? I'm beginning to feel that giving you review comments is being seen as some kind of negative action. Needing to repeat myself makes it clear that you aren't taking note. Yes, I expect you to do these things * collect other people's input, even if you personally disagree * if there is disagreement amongst reviewers, seek to resolve that in a fair and reasonable manner * publish a summary of changes requested * do major work to address them So yes, I really expect that. It doesn't matter that it is "only SImon" or "only Kevin". **One** comment is enough for you to take note. If there is disagreement, publishing the summary of changes you plan to make in your next version will help highlight that. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 8, 2014 at 10:49 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Do you really expect me to do major work on some aspect of the syntax >> like that, given, as you say, that nobody explicitly agreed with me >> (and only you disagreed with me)? The only remark I heard on that was >> from you (you'd prefer to use NEW.* and OLD.*). >> But you spent much >> more time talking about getting something MERGE-like, which >> NEW.*/OLD.* clearly isn't. > > Yes, it is. Look at the AS clause. You can alias each of the two tables being joined. But I only have one table, and no join. When you referred to NEW.* and OLD.*, you clearly were making a comparison with trigger WHEN clauses, and not MERGE (which is a comparison I made myself, although for more technical reasons). It hardly matters, though. >> CONFLICTING() is very close (identical?) to MySQL's use of "ON >> DUPLICATE KEY UPDATE val = VALUES(val)". I'm happy to discuss that, >> but it's news to me that people take particular issue with it. > > 3 people have asked you questions or commented about the use of > CONFLICTING() while I've been watching. Lots of people have asked me lots of questions. Again, as I said, I wasn't aware that CONFLICTING() was a particular point of contention. Please be more specific. > It's clearly a non-standard > mechanism and not inline with other Postgres usage. What would be "inline with other Postgres usage"? I don't think you've been clear on what you think is a better alternative. I felt a function-like expression was appropriate because the user refers to different tuples of the target table. It isn't like a join. Plus it's similar to the MySQL thing, but doesn't misuse VALUES() as a function-like thing. > If there is disagreement, publishing the summary of changes you plan > to make in your next version will help highlight that. I think I've done a pretty good job of collecting and collating the opinions of others, fwiw. -- Peter Geoghegan
On 9 October 2014 07:27, Peter Geoghegan <pg@heroku.com> wrote: > Please be more specific. Do not use CONFLICTING() which looks like it is a function. Instead, use a row qualifier, such as NEW, OLD etc to reference values from the incoming data e.g. CONFLICTING.value rather than CONFLICTING(value) Do not use the word CONFLICTING since it isn't clear whether you are referring to the row in the table or the value in the incoming data. I suggest the use of two separately named row qualifiers to allow us to use either of those when desired. I don't have suggestions as to what you should call those qualifiers, though Postgres already uses NEW and OLD in similar circumstances in triggers. (This has nothing at all to do with the MERGE command in the SQL standard, so please don't mention that here.) You may also wish to support the AS keyword, as MERGE does to make the above even more clear. e.g. SET col = EXISTING.col + NEW.col Thank you. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 9, 2014 at 12:38 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Do not use CONFLICTING() which looks like it is a function. So is ROW(). Or COALESCE(). > Instead, use a row qualifier, such as NEW, OLD etc to reference values > from the incoming data > e.g. CONFLICTING.value rather than CONFLICTING(value) > > Do not use the word CONFLICTING since it isn't clear whether you are > referring to the row in the table or the value in the incoming data. If you don't have a word that you think would more clearly indicate the intent of the expression, I'm happy to hear suggestions from others. > You may also wish to support the AS keyword, as MERGE does to make the > above even more clear. > > e.g. SET col = EXISTING.col + NEW.col That's less clear, IMV. EXISTING.col is col - the very same Var. So why qualify that it's the existing value in one place but not the other? In fact, you can't do that now with updates in general: postgres=# update upsert u set u.val = 'foo'; ERROR: 42703: column "u" of relation "upsert" does not exist LINE 1: update upsert u set u.val = 'foo'; ^ LOCATION: transformUpdateStmt, analyze.c:2068 This does work, which is kind of what you outline: postgres=# update upsert u set val = u.val; UPDATE 3 But MERGE accepts the former in other systems (in general, and for MERGE), where Postgres won't (for UPDATEs in general). Parse analysis of UPDATE targetlists just rejects this outright. FWIW, is any of the two tuples reference here "NEW", in any sense? Informally, I'd say the new value is the resulting row - the final row value after the UPDATE. We want to refer to the existing row, and the row proposed for insertion (with all before trigger effects carried forward). Having the column reference go through an alias like this might be tricky. -- Peter Geoghegan
On Thu, Oct 9, 2014 at 11:11 AM, Peter Geoghegan <pg@heroku.com> wrote: > On Thu, Oct 9, 2014 at 12:38 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Do not use CONFLICTING() which looks like it is a function. > > So is ROW(). Or COALESCE(). ROW and COALESCE behave almost like functions: they operate on any expression or value you pass to them. db=# select coalesce('bar');coalesce ----------bar Not so with CONFLICTING(), it only accepts a column name -- not a value -- and has knowledge of the surrounding statement that ordinary function-like constructs don't. db=# INSERT into evt_type (name) values ('foo') on conflict UPDATE set name=conflicting('bar'); ERROR: syntax error at or near "'bar'" LINE 1: ...lues ('foo') on conflict UPDATE set name=conflicting('bar'); > If you don't have a word that you think would more clearly indicate > the intent of the expression, I'm happy to hear suggestions from > others. I also like NEW due to similarity with triggers, but I see your concern about it not actually being "new". Regards, Marti
On 9 October 2014 09:11, Peter Geoghegan <pg@heroku.com> wrote: >> You may also wish to support the AS keyword, as MERGE does to make the >> above even more clear. >> >> e.g. SET col = EXISTING.col + NEW.col > > That's less clear, IMV. EXISTING.col is col - the very same Var. So > why qualify that it's the existing value in one place but not the > other? In fact, you can't do that now with updates in general: > > postgres=# update upsert u set u.val = 'foo'; > ERROR: 42703: column "u" of relation "upsert" does not exist > LINE 1: update upsert u set u.val = 'foo'; > ^ > LOCATION: transformUpdateStmt, analyze.c:2068 YES, which is exactly why I did not say this, I said something different. > This does work, which is kind of what you outline: > > postgres=# update upsert u set val = u.val; > UPDATE 3 YES, which is why I said it. > But MERGE accepts the former in other systems (in general, and for > MERGE), where Postgres won't (for UPDATEs in general). Parse analysis > of UPDATE targetlists just rejects this outright. > > FWIW, is any of the two tuples reference here "NEW", in any sense? > Informally, I'd say the new value is the resulting row - the final row > value after the UPDATE. We want to refer to the existing row, and the > row proposed for insertion (with all before trigger effects carried > forward). YES, which is why I specifically requested the ability to reference "the incoming data". Common sense interpretations make for quicker and easier discussions. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 9, 2014 at 1:33 AM, Marti Raudsepp <marti@juffo.org> wrote: > ROW and COALESCE behave almost like functions: they operate on any > expression or value you pass to them. Okay, then like CONFLICTING() is like many of the XML expressions. -- Peter Geoghegan
On Thu, Oct 9, 2014 at 1:41 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > YES, which is why I specifically requested the ability to reference > "the incoming data". My point is that people are not really inclined to use an alias in UPDATEs in general when referring to the target. The thing that seems special (and worthy of special qualification) is the reference to what you call the "incoming data", and what I've called "tuples proposed for insertion" (after being affected by any before row triggers). -- Peter Geoghegan
On Thu, Oct 9, 2014 at 1:56 AM, Peter Geoghegan <pg@heroku.com> wrote: > My point is that people are not really inclined to use an alias in > UPDATEs in general when referring to the target. The thing that seems > special (and worthy of special qualification) is the reference to what > you call the "incoming data", and what I've called "tuples proposed > for insertion" (after being affected by any before row triggers). For simple cases, you might not even bother with CONFLICTING() - you might find it easier to just repeat the constant in the INSERT and UPDATE parts of the query. -- Peter Geoghegan
V 1.3 is attached. Like Simon, I think that it's premature to commit to one particular value locking implementation. Personally, I think that approach #2 is what we'll end up using, but it makes no sense to not maintain both at once, since it requires relatively little effort to do so. At the very least it's a useful tool for reviewers, who would otherwise be denied the opportunity to test whether any given concurrency problem was attributable to the value locking implementation, or something else. So there are two variants of V 1.3 attached - one uses an unchanged value locking implementation #1, while the other an unchanged implementation #2. Highlights ======== * No more "WITHIN unique_index_name". There is a new syntax that supersedes it. Unique index inference based on columns (or expressions) is *mandatory* for the ON CONFLICT UPDATE variant. It remains optional for the IGNORE variant, because I think someone could very reasonably not care which unique index was implicated. As discussed, this implies that partial unique indexes are no longer supported (but expression indexes work just fine). This may be revisited, but I suggest doing so in a later release. Example of merging on the unique index on column "name": postgres=# explain insert into capitals(name, population, altitude) values ('Riga', 1000, 5) on conflict (name) update set altitude = excluded(altitude); QUERY PLAN ---------------------------------------------------------------------- Insert on capitals (cost=0.00..0.01 rows=1 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) -> Conflict Update on capitals (cost=0.00..1.01 rows=1 width=52) (3 rows) When we cannot infer a unique index, it looks like this: postgres=# explain insert into capitals(name, population, altitude) values ('Riga', 1000, 5) on conflict (population) update set altitude = excluded(altitude); ERROR: 42P10: could not infer which unique index to use from expressions/columns provided for ON CONFLICT LINE 1: ...e, population, altitude) values ('Riga', 1000, 5) on conflic... ^ HINT: Partial unique indexes are not supported. LOCATION: transformConflictClause, parse_clause.c:2407 * No more planner kludges. Index paths are never created for the auxiliary UPDATE plan to begin with. To be tidy, TID scan paths are never added either. Unless the UPDATE will never proceed due to a tautological predicate like "WHERE false", we're guaranteed to have a "sequential scan" in a fairly principled way (as before, we just use this with EvalPlanQual(); it's just an implementation detail). I am not entirely qualified to say so, but I think that this makes my modifications to the optimizer look quite reasonable. The kludge of enforcing various restrictions (e.g. on subqueries appearing in the UPDATE) in the optimizer is also removed. We prefer to enforce everything during parse analysis. This also gets us better error messages, which is nice. * Sane (although limited) support for table inheritance and updatable views. However, in general user-defined rules are unsupported - I cannot see how that could be made sane. The IGNORE variant works for updatable views, and for inheritance relations with children (provided that there is no inference required, which effectively makes the UPDATE variant unsupported). However, both IGNORE and UPDATE variants work for relations that happen to be in an inheritance hierarchy, provided they have no children. I think it's fine to only support the IGNORE variant for relations with inheritance children, because even when users have the "partitioning pattern" use-case, there is no principled way of telling the difference between a vanilla INSERT, and an INSERT with an ON CONFLICT UPDATE clause from within the custom redirection trigger function. Also, unique indexes already only work at the relation level with inheritance - that's a long-standing limitation. And so, in general INSERTs better have the right inheritance child as their target - this is no more and no less true when there is an ON CONFLICT UPDATE clause (note that I'm talking about the "object orientated" inheritance use-case here, and not the "partitioning pattern" use-case - with the latter, it's all up to the trigger function to do the right thing). There may currently be an "UPDATE ONLY", but there is no "INSERT ONLY" - why should I add one? * Both INSERT and UPDATE sets of statement level triggers fire for INSERT with ON CONFLICT UPDATE. The number of rows affected doesn't matter, nor does it matter how they were or were not affected. This certainly seems like the correct behavior. Per-row triggers work the same as before, since far more thought went into their behavior earlier. This incorporates feedback from Kevin. * CONFLICTING() is renamed to EXCLUDED(). I know that some people wanted me to go a different way with this. I think that there are very good practical reasons not to [1], as well as good reasons related to design, but I still accept that CONFLICTING() isn't very descriptive. This spelling seems a lot better. Clean-up ======= If you take a look at the EXPLAIN output above, you'll see that the sequential scan node does not appear. Basically, I'm back to suppressing the implementation detail of that never-executed "sequential scan", but the new approach is far better than earlier approaches. Certain things actually associated with the sequential scan are now attributed to the parent (auxiliary) ModifyTable UPDATE node. Example (incidentally, note that "key" is used to infer a unique index to take as an arbiter index): postgres=# explain insert into upsert values (1000, 'Plucky') on conflict (key) update set val = excluded(val) where key = 5; QUERY PLAN --------------------------------------------------------------------- Insert on upsert (cost=0.00..0.01 rows=1 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) -> Conflict Update on upsert (cost=0.00..25.38 rows=6 width=36) Filter: (key = 5) (4 rows) This seems reasonable to me. Including the "(never executed)" sequential scan would be very confusing to users. Inference ======= Unique index inference (i.e. the way we figure out *which* unique index to use) occurs during parse analysis. I think it would be inappropriate, and certainly inconvenient to do it during planning. I maintain that ON CONFLICT DML statements have a legitimate semantic dependence on particular unique indexes, which makes this appropriate. I don't know about others, but my only problem with naming unique indexes directly was that it is unmaintainable, and very ugly. In principle, I think this semantic dependence is quite reasonable, and reflects the reality of how we all expect this to work. There are comments on the implications for plan caching and how that relates to where and when we perform unique index inference. Documentation =========== The documentation has been updated, incorporating feedback. I also made the cardinality violation error a lot clearer than before, since Craig said that was unclear. Tests ==== Many tests were added. I've added a couple of new isolation tests. insert-conflict-update-3 should be of particular interest. That test illustrates the visibility issues with the WHERE clause that I've already highlighted as a possible concern [2]. Unique index inference tests will also give you a fair idea of how flexible it is. Remaining open items ================= Apart from the obvious issue of value locking (i.e. verifying the correctness of its implementation in general), the only open items are: * RLS needs to be considered. I have yet to give it any real thought. * I could probably do better at postgres_fdw support. That seems like something that could be followed up on later, because it's clearly just about a Simple Matter of Programming. In summary, I was able to remove a lot of TODO/FIXME items here - almost all of them. I'm pretty happy about that. I'll have to edit the UPSERT wiki page, to strike out many open items... [1] http://www.postgresql.org/message-id/CAM3SWZQhiXQi1osT14V7spjQrUpmcnRtbXJe846-EB1bC+9i1g@mail.gmail.com [2] https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29 -- Peter Geoghegan
Attachment
On Thu, Oct 23, 2014 at 6:43 PM, Peter Geoghegan <pg@heroku.com> wrote: > Documentation > =========== > > The documentation has been updated, incorporating feedback. I also > made the cardinality violation error a lot clearer than before, since > Craig said that was unclear. For the convenience of those that want to read the documentation quickly to figure out one particular aspect of user-visible behavior, the latest revision is uploaded here: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs Of particular interest: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/transaction-iso.html#XACT-READ-COMMITTED http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-createview.html#SQL-CREATEVIEW-UPDATABLE-VIEWS http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/ddl-inherit.html -- Peter Geoghegan
On Thu, Oct 23, 2014 at 9:43 PM, Peter Geoghegan <pg@heroku.com> wrote: > * CONFLICTING() is renamed to EXCLUDED(). I know that some people > wanted me to go a different way with this. I think that there are very > good practical reasons not to [1], as well as good reasons related to > design, but I still accept that CONFLICTING() isn't very descriptive. > This spelling seems a lot better. Specifically, "some people" included at least three committers and at least one other community member no less prominent than yourself, or in other words, every single person who bothered to comment. You can think whatever you like; the chances of it being committed that way are zero. > Unique index inference (i.e. the way we figure out *which* unique > index to use) occurs during parse analysis. I think it would be > inappropriate, and certainly inconvenient to do it during planning. You're wrong. The choice of which index to use is clearly wildly inappropriately placed in the parse analysis phase, and if you think that has any chance of ever being committed by anyone, then you are presuming the existence of a committer who won't mind ignoring the almost-immediate revert request that would draw from, at the very least, Tom. As far as syntax goes, I thought the INSERT .. ON CONFLICT UPDATE syntax proposed upthread was the best of any mentioned thus far. The MERGE-based syntaxes proposed upthread were crazily verbose for no discernable benefit. As much as I'd like to have this feature, your refusal to change anything except when asked at least three times each by about five different people makes this effort barely worth pursuing. You can say all you like that you're receptive to feedback, but multiple people here are telling you that they feel otherwise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 10/24/2014 10:04 AM, Robert Haas wrote: > As far as syntax goes, I thought the INSERT .. ON CONFLICT UPDATE > syntax proposed upthread was the best of any mentioned thus far. The > MERGE-based syntaxes proposed upthread were crazily verbose for no > discernable benefit. For those of us who haven't followed every post in this thread, is there somewhere I can see the proposed syntax? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, Oct 24, 2014 at 1:18 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 10/24/2014 10:04 AM, Robert Haas wrote: >> As far as syntax goes, I thought the INSERT .. ON CONFLICT UPDATE >> syntax proposed upthread was the best of any mentioned thus far. The >> MERGE-based syntaxes proposed upthread were crazily verbose for no >> discernable benefit. > > For those of us who haven't followed every post in this thread, is there > somewhere I can see the proposed syntax? There are a couple of different proposals but this should give you a feeling for where we're at: http://www.postgresql.org/message-id/CA+TgmoZN=2AJKi1n4Jz5BkmYi8r_CPUDW+DtoppmTeLVmsOoqw@mail.gmail.com The part I like is the idea of making UPSERT look like an INSERT statement with an additional clause that says how a unique violation should be handled. The part nobody except Peter likes is using functional notation like CONFLICTING() or EXCLUDED() to pull in values from the tuple causing the unique violation. And there are some other areas of disagreement about particular keywords and so on. But I personally like that general style much more than the alternatives derived from MERGE. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Oct 24, 2014 at 10:04 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 23, 2014 at 9:43 PM, Peter Geoghegan <pg@heroku.com> wrote: >> * CONFLICTING() is renamed to EXCLUDED(). I know that some people >> wanted me to go a different way with this. I think that there are very >> good practical reasons not to [1], as well as good reasons related to >> design, but I still accept that CONFLICTING() isn't very descriptive. >> This spelling seems a lot better. > > Specifically, "some people" included at least three committers and at > least one other community member no less prominent than yourself, or > in other words, every single person who bothered to comment. You can > think whatever you like; the chances of it being committed that way > are zero. This is the situation with unique index inference all over again. You don't like a function-like expression. Okay. It would be a lot more helpful if instead of just criticizing, you *also* looked at my reasons for not wanting to go with something that would necessitate adding a dummy range table entry [1]. There are some very good practical reasons for avoiding that. We only do that (the OLD.* and NEW.* stuff) in the context of CREATE RULE and one or two other things. We don't play any games like that during parse analysis of ordinary optimizable statements, which is what this is. And those dummy RTEs are always placeholders, AFAICT. Apart from those technical reasons, the two situations just aren't comparable from a user-visible perspective, but that isn't the main problem. I don't particularly expect you to come around to my view here. But it would be nice to have the problems with dummy RTEs and so on acknowledged, so it's understood that that is in itself a strange new direction for parse analysis of ordinary optimizable statements. >> Unique index inference (i.e. the way we figure out *which* unique >> index to use) occurs during parse analysis. I think it would be >> inappropriate, and certainly inconvenient to do it during planning. > > You're wrong. The choice of which index to use is clearly wildly > inappropriately placed in the parse analysis phase, and if you think > that has any chance of ever being committed by anyone, then you are > presuming the existence of a committer who won't mind ignoring the > almost-immediate revert request that would draw from, at the very > least, Tom. Why? This has nothing to do with optimization. You either have an appropriate unique index available, in which case you can proceed (and the cost associated with value locking the index and so on is an inherent cost, kind of like inserting into an index in general). Or you don't, and you can't (you get an error, not a sequential scan). That's what I'd call a semantic dependence on the index. Yes, that's kind of unusual, but it's the reality. I can have the code do unique index inference during planning instead if you insist, but I think that that isn't useful. If you want me to do things that way, please explain why. Otherwise, even if I simply acquiesce to your wishes, I may end up missing the point. > As far as syntax goes, I thought the INSERT .. ON CONFLICT UPDATE > syntax proposed upthread was the best of any mentioned thus far. The > MERGE-based syntaxes proposed upthread were crazily verbose for no > discernable benefit. I don't think anyone is too insistent on pursuing that. I think that the INSERT .. ON CONFLICT UPDATE syntax has more or less emerged as the favorite. > As much as I'd like to have this feature, your refusal to change > anything except when asked at least three times each by about five > different people makes this effort barely worth pursuing. You can say > all you like that you're receptive to feedback, but multiple people > here are telling you that they feel otherwise. I'm tired of your assertions about why I haven't done certain things. It's not fair. I have incorporated a lot of feedback into V1.3 (and previous versions), which isn't acknowledged. Also, I moved to the USA this week, and things have been hectic. As I said in relation to unique index inference, please consider that I haven't immediately complied with that feedback because there are genuine technical obstacles that at the very least make it more difficult than it appears. And, I'm not necessarily able to comply quickly because of time constraints. [1] http://www.postgresql.org/message-id/CAM3SWZQhiXQi1osT14V7spjQrUpmcnRtbXJe846-EB1bC+9i1g@mail.gmail.com -- Peter Geoghegan
On Fri, Oct 24, 2014 at 3:01 PM, Peter Geoghegan <pg@heroku.com> wrote: > This is the situation with unique index inference all over again. You > don't like a function-like expression. Okay. It would be a lot more > helpful if instead of just criticizing, you *also* looked at my > reasons for not wanting to go with something that would necessitate > adding a dummy range table entry [1]. The problem here isn't that I haven't read your emails. I have read them all, including that one. Moreover, this isn't the first time you've asserted that someone hasn't read one of your emails. So if we're going to say what we each think would be helpful, then I think it would be helpful if you stopped accusing the people who are taking time to provide feedback on your work of having failed to read your emails. It's possible that there may be instances where that problem exists, but it beggars credulity to suppose that the repeated unanimous consensus against some of your design decisions is entirely an artifact of failure to pay attention. The fact is, I don't feel obliged to respond to every one of your emails, just as you don't respond to every one of mine. If you want this patch to ever get committed, it's your job to push it forward - not mine, not Simon's, and not Heikki's. Sometimes that means you have to solve hard problems instead of just articulating what they are. > There are some very good > practical reasons for avoiding that. We only do that (the OLD.* and > NEW.* stuff) in the context of CREATE RULE and one or two other > things. We don't play any games like that during parse analysis of > ordinary optimizable statements, which is what this is. And those > dummy RTEs are always placeholders, AFAICT. Apart from those technical > reasons, the two situations just aren't comparable from a user-visible > perspective, but that isn't the main problem. > > I don't particularly expect you to come around to my view here. But it > would be nice to have the problems with dummy RTEs and so on > acknowledged, so it's understood that that is in itself a strange new > direction for parse analysis of ordinary optimizable statements. You're conflating the user-visible syntax with the parse tree representation in way that is utterly without foundation. I don't have a position at this point on which parse-analysis representation is preferable, but it's completely wrong-headed to say that you "can't" make NEW.x produce the same parse-analysis output that your CONFLICTING(x) syntax would have created. Sure, it might be harder, but it's not that much harder, and it's definitely not an unsolvable problem. I do acknowledge that there might be a better syntax out there than NEW.x and OLD.x. I have not seen one proposed that I like better. Feel free to propose something. But don't keep re-proposing something that LOOKSLIKE(this) because nobody - other than perhaps you - likes that. And don't use the difficulty of parse analysis as a justification for your proposed syntax, because, except in extreme cases, there are going to be very few if any regular contributors to this mailing list who will accept that as a justification for one syntax over another. Syntax needs to be justified by being beautiful, elegant, precedent-driven, orthogonal, and minimalist - not by whether you might need an extra 25-75 lines of parse analysis code to make it work. >>> Unique index inference (i.e. the way we figure out *which* unique >>> index to use) occurs during parse analysis. I think it would be >>> inappropriate, and certainly inconvenient to do it during planning. >> >> You're wrong. The choice of which index to use is clearly wildly >> inappropriately placed in the parse analysis phase, and if you think >> that has any chance of ever being committed by anyone, then you are >> presuming the existence of a committer who won't mind ignoring the >> almost-immediate revert request that would draw from, at the very >> least, Tom. > > Why? This has nothing to do with optimization. That is false. If there is more than one index that could be used, the system should select the best one. That is an optimization decision per se. Also, if a plan is saved - in the plancache, say, or in a view - the query can be re-planned if the index it depends on is dropped, but there's no way to do parse analysis. >> As much as I'd like to have this feature, your refusal to change >> anything except when asked at least three times each by about five >> different people makes this effort barely worth pursuing. You can say >> all you like that you're receptive to feedback, but multiple people >> here are telling you that they feel otherwise. > > I'm tired of your assertions about why I haven't done certain things. > It's not fair. I have incorporated a lot of feedback into V1.3 (and > previous versions), which isn't acknowledged. Also, I moved to the USA > this week, and things have been hectic. As I said in relation to > unique index inference, please consider that I haven't immediately > complied with that feedback because there are genuine technical > obstacles that at the very least make it more difficult than it > appears. And, I'm not necessarily able to comply quickly because of > time constraints. Well, I'm equally tired of being asked to review patches that respond to only a small percentage of the feedback already given. I, too, sometimes lack the time to incorporate the feedback of others into my patches. When that happens, I don't re-post them until I do have the time. I've even been known to drop patches altogether rather than continue arguing about them, as you will of course recall. There's no shame in taking longer to get something done, but asking other people to spend time on it when you haven't had time yourself can lead to frustrations. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Oct 24, 2014 at 1:07 PM, Robert Haas <robertmhaas@gmail.com> wrote: > The problem here isn't that I haven't read your emails. I have read > them all, including that one. Moreover, this isn't the first time > you've asserted that someone hasn't read one of your emails. So if > we're going to say what we each think would be helpful, then I think > it would be helpful if you stopped accusing the people who are taking > time to provide feedback on your work of having failed to read your > emails. It's possible that there may be instances where that problem > exists, but it beggars credulity to suppose that the repeated > unanimous consensus against some of your design decisions is entirely > an artifact of failure to pay attention. Okay. > The fact is, I don't feel obliged to respond to every one of your > emails, just as you don't respond to every one of mine. If you want > this patch to ever get committed, it's your job to push it forward - > not mine, not Simon's, and not Heikki's. Sometimes that means you > have to solve hard problems instead of just articulating what they > are. Okay. > You're conflating the user-visible syntax with the parse tree > representation in way that is utterly without foundation. I don't > have a position at this point on which parse-analysis representation > is preferable, but it's completely wrong-headed to say that you > "can't" make NEW.x produce the same parse-analysis output that your > CONFLICTING(x) syntax would have created. Sure, it might be harder, > but it's not that much harder, and it's definitely not an unsolvable > problem. I don't believe I did. The broader point is that the difficulty in making that work reflects the conceptually messiness, from user-visible aspects down. I can work with the difficulty, and I may even be able to live with the messiness, but I'm trying to bring the problems with it to a head sooner rather than later for good practical reasons. In all sincerity, my real concern is that you or the others will change your mind when I actually go and implement an OLD.* style syntax, and see the gory details. I regret it if to ask this is to ask too much of you, but FYI that's the thought process behind it. > I do acknowledge that there might be a better syntax out there than > NEW.x and OLD.x. I have not seen one proposed that I like better. > Feel free to propose something. But don't keep re-proposing something > that LOOKSLIKE(this) because nobody - other than perhaps you - likes > that. Okay. So in an UPDATE targetlist, you can assign DEFAULT to a column. Maybe that's an interesting precedent. During rewriting, this gets rewritten such that you end up with something that looks to the planner as if the original query included a constant (this actually comes from a catalog look-up for the column during rewriting). What if we spelled EXCLUDING/CONFLICTING as follows: INSERT INTO upsert VALUES(1, 'Art') ON CONFLICT (key) UPDATE SET val = EXCLUDED || 'this works' WHERE another_col != EXCLUDED; Then rewriting would figure these details out. From a design perspective, there'd need to be a few details worked out about how inference actually works - inferring *which* column the EXCLUDED expression actually referred to, but it seems doable, especially given the existing restrictions on the structure of the UPDATE. We're not rewriting from a SetToDefault to a constant, but a SetToDefault-like thing to a special Var (actually, the finished representation probably makes it to the execution stage with that Var representation filled in, unlike SetToDefault, but it's basically the same pattern). It solves my problem with dummy range table entries. Actually, *any* new kind of expression accomplishes this just as well. My concern here is more around not needing cute tricks with dummy RTEs than it is around being in favor of any particular expression-based syntax. What do you think of that? > And don't use the difficulty of parse analysis as a > justification for your proposed syntax, because, except in extreme > cases, there are going to be very few if any regular contributors to > this mailing list who will accept that as a justification for one > syntax over another. Syntax needs to be justified by being beautiful, > elegant, precedent-driven, orthogonal, and minimalist - not by whether > you might need an extra 25-75 lines of parse analysis code to make it > work. Well, that isn't the case here. I'd much rather code my way out of a disagreement with you. >>>> Unique index inference (i.e. the way we figure out *which* unique >>>> index to use) occurs during parse analysis. I think it would be >>>> inappropriate, and certainly inconvenient to do it during planning. >>> >>> You're wrong. The choice of which index to use is clearly wildly >>> inappropriately placed in the parse analysis phase, and if you think >>> that has any chance of ever being committed by anyone, then you are >>> presuming the existence of a committer who won't mind ignoring the >>> almost-immediate revert request that would draw from, at the very >>> least, Tom. >> >> Why? This has nothing to do with optimization. > > That is false. If there is more than one index that could be used, > the system should select the best one. That is an optimization > decision per se. Also, if a plan is saved - in the plancache, say, or > in a view - the query can be re-planned if the index it depends on is > dropped, but there's no way to do parse analysis. Generating index paths for the UPDATE is a waste of cycles. Theoretically, there could be an (a, b, c) unique index and a (c,b,a) unique index, and those two might have a non-equal cost to scan. But that almost certainly isn't going to happen in practice, since that's a rather questionable indexing strategy, and even when it does, you're going to have to insert into all the unique indexes a good proportion of the time anyway, making the benefits of that approach pale in comparison to the costs. And that's just the cost in CPU cycles, and not code complexity. I don't know why you want to bring a cost model, or choice of indexes into this. It simply isn't comparable to how the system comes up with which index to use in all other contexts. Now, I could easily make all this happen during planning, just to not have to argue with you, but I think that doing so is less similar to how things already work, not more similar. It certainly doesn't imply more code reuse, since get_relation_info() is clearly quite unsuitable. > Well, I'm equally tired of being asked to review patches that respond > to only a small percentage of the feedback already given. I, too, > sometimes lack the time to incorporate the feedback of others into my > patches. When that happens, I don't re-post them until I do have the > time. I've even been known to drop patches altogether rather than > continue arguing about them, as you will of course recall. There's no > shame in taking longer to get something done, but asking other people > to spend time on it when you haven't had time yourself can lead to > frustrations. From my point of view, I spent a significant amount of time making the patch more or less match your proposed design for unique index inference. It is discouraging to hear that you think I'm not cooperating with community process. I'm doing my best. I think it would be a bad idea for me to not engage with the community for an extended period at this point. There were plenty of other issues address by V1.3 that were not the CONFLICTING()/EXCLUDING thing that you highlighted (or the other thing you highlighted around where to do unique index inference). -- Peter Geoghegan
On 25/10/14 00:48, Peter Geoghegan wrote: >> You're conflating the user-visible syntax with the parse tree >> representation in way that is utterly without foundation. I don't >> have a position at this point on which parse-analysis representation >> is preferable, but it's completely wrong-headed to say that you >> "can't" make NEW.x produce the same parse-analysis output that your >> CONFLICTING(x) syntax would have created. Sure, it might be harder, >> but it's not that much harder, and it's definitely not an unsolvable >> problem. > > I don't believe I did. The broader point is that the difficulty in > making that work reflects the conceptually messiness, from > user-visible aspects down. I can work with the difficulty, and I may > even be able to live with the messiness, but I'm trying to bring the > problems with it to a head sooner rather than later for good practical > reasons. In all sincerity, my real concern is that you or the others > will change your mind when I actually go and implement an OLD.* style > syntax, and see the gory details. I regret it if to ask this is to ask > too much of you, but FYI that's the thought process behind it. > If you feel so strongly that it's wrong even though everybody else seems to prefer it and if you at the same time feel so strongly about people changing minds once you implement this, maybe the best way to convince us is to show us the implementation (at this point it would probably have taken less of your time than the argument did). > > So in an UPDATE targetlist, you can assign DEFAULT to a column. Maybe > that's an interesting precedent. During rewriting, this gets rewritten > such that you end up with something that looks to the planner as if > the original query included a constant (this actually comes from a > catalog look-up for the column during rewriting). What if we spelled > EXCLUDING/CONFLICTING as follows: > > INSERT INTO upsert VALUES(1, 'Art') ON CONFLICT (key) UPDATE SET val = > EXCLUDED || 'this works' WHERE another_col != EXCLUDED; > > Then rewriting would figure these details out. From a design > perspective, there'd need to be a few details worked out about how > inference actually works - inferring *which* column the EXCLUDED > expression actually referred to, but it seems doable, especially given > the existing restrictions on the structure of the UPDATE. We're not > rewriting from a SetToDefault to a constant, but a SetToDefault-like > thing to a special Var (actually, the finished representation probably > makes it to the execution stage with that Var representation filled > in, unlike SetToDefault, but it's basically the same pattern). It > solves my problem with dummy range table entries. Actually, *any* new > kind of expression accomplishes this just as well. My concern here is > more around not needing cute tricks with dummy RTEs than it is around > being in favor of any particular expression-based syntax. > > What do you think of that? > Ugh, you want to auto-magically detect what value is behind the EXCLUDED based on how/where it's used in the UPDATE? That seems like quite a bad idea. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Oct 24, 2014 at 4:39 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: > If you feel so strongly that it's wrong even though everybody else seems to > prefer it and if you at the same time feel so strongly about people changing > minds once you implement this, maybe the best way to convince us is to show > us the implementation (at this point it would probably have taken less of > your time than the argument did). No, it wouldn't have - I don't think anyone believes that. Magic addRangeTableEntryForRelation() calls are only used in the context of one or two utility statements that have pretty limited scope. Support for an OLD.* style syntax would have to exist at *all* stages of query execution, from parse analysis through to rewriting, planning, and execution. That's the difference here - this isn't a utility command. >> So in an UPDATE targetlist, you can assign DEFAULT to a column. Maybe >> that's an interesting precedent. During rewriting, this gets rewritten >> such that you end up with something that looks to the planner as if >> the original query included a constant (this actually comes from a >> catalog look-up for the column during rewriting). What if we spelled >> EXCLUDING/CONFLICTING as follows: >> >> INSERT INTO upsert VALUES(1, 'Art') ON CONFLICT (key) UPDATE SET val = >> EXCLUDED || 'this works' WHERE another_col != EXCLUDED; >> >> Then rewriting would figure these details out. From a design >> perspective, there'd need to be a few details worked out about how >> inference actually works - inferring *which* column the EXCLUDED >> expression actually referred to, but it seems doable, especially given >> the existing restrictions on the structure of the UPDATE. We're not >> rewriting from a SetToDefault to a constant, but a SetToDefault-like >> thing to a special Var (actually, the finished representation probably >> makes it to the execution stage with that Var representation filled >> in, unlike SetToDefault, but it's basically the same pattern). It >> solves my problem with dummy range table entries. Actually, *any* new >> kind of expression accomplishes this just as well. My concern here is >> more around not needing cute tricks with dummy RTEs than it is around >> being in favor of any particular expression-based syntax. >> >> What do you think of that? >> > > Ugh, you want to auto-magically detect what value is behind the EXCLUDED > based on how/where it's used in the UPDATE? That seems like quite a bad > idea. That's *exactly* how DEFAULT works within UPDATE targetlists. There might be a few more details to work out here, but not terribly many, and that's going to be true no matter what. 95%+ of the time, it'll just be "val = EXCLUDED" anyway. -- Peter Geoghegan
Peter Geoghegan wrote: > On Fri, Oct 24, 2014 at 4:39 PM, Petr Jelinek <petr@2ndquadrant.com> wrote: > > Ugh, you want to auto-magically detect what value is behind the EXCLUDED > > based on how/where it's used in the UPDATE? That seems like quite a bad > > idea. > > That's *exactly* how DEFAULT works within UPDATE targetlists. There > might be a few more details to work out here, but not terribly many, > and that's going to be true no matter what. 95%+ of the time, it'll > just be "val = EXCLUDED" anyway. Petr's thought mirrors mine. What are you going to do the other 5% of the time? Is there some other way to refer to the columns of the "excluded" row? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Oct 24, 2014 at 6:48 PM, Peter Geoghegan <pg@heroku.com> wrote: >> You're conflating the user-visible syntax with the parse tree >> representation in way that is utterly without foundation. I don't >> have a position at this point on which parse-analysis representation >> is preferable, but it's completely wrong-headed to say that you >> "can't" make NEW.x produce the same parse-analysis output that your >> CONFLICTING(x) syntax would have created. Sure, it might be harder, >> but it's not that much harder, and it's definitely not an unsolvable >> problem. > > I don't believe I did. The broader point is that the difficulty in > making that work reflects the conceptually messiness, from > user-visible aspects down. I can work with the difficulty, and I may > even be able to live with the messiness, but I'm trying to bring the > problems with it to a head sooner rather than later for good practical > reasons. In all sincerity, my real concern is that you or the others > will change your mind when I actually go and implement an OLD.* style > syntax, and see the gory details. I regret it if to ask this is to ask > too much of you, but FYI that's the thought process behind it. I think what's more likely is that we'll complain about the way you chose to implement it. I don't believe your contention (in the other email) that "Support for an OLD.* style syntax would have to exist at *all* stages of query execution, from parse analysis through to rewriting, planning, and execution." I think if your design for implementing that syntax requires that, and your design for some other syntax doesn't require that, then you're not thinking clearly enough about what needs to happen in parse analysis. Make the parse analysis for this syntax emit the same representation that you would have had it emit in the other syntax, and you won't have this problem. > What if we spelled > EXCLUDING/CONFLICTING as follows: > > INSERT INTO upsert VALUES(1, 'Art') ON CONFLICT (key) UPDATE SET val = > EXCLUDED || 'this works' WHERE another_col != EXCLUDED; > > What do you think of that? I am in complete agreement with the comments made by Petr and Alvaro. >>>> You're wrong. The choice of which index to use is clearly wildly >>>> inappropriately placed in the parse analysis phase, and if you think >>>> that has any chance of ever being committed by anyone, then you are >>>> presuming the existence of a committer who won't mind ignoring the >>>> almost-immediate revert request that would draw from, at the very >>>> least, Tom. >>> >>> Why? This has nothing to do with optimization. >> >> That is false. If there is more than one index that could be used, >> the system should select the best one. That is an optimization >> decision per se. Also, if a plan is saved - in the plancache, say, or >> in a view - the query can be re-planned if the index it depends on is >> dropped, but there's no way to do parse analysis. > > Generating index paths for the UPDATE is a waste of cycles. > Theoretically, there could be an (a, b, c) unique index and a (c,b,a) > unique index, and those two might have a non-equal cost to scan. But > that almost certainly isn't going to happen in practice, since that's > a rather questionable indexing strategy, and even when it does, you're > going to have to insert into all the unique indexes a good proportion > of the time anyway, making the benefits of that approach pale in > comparison to the costs. I don't care whether you actually generate index-paths or not, and in fact I suspect it makes no sense to do so. But I do care that you do a cost comparison between the available indexes and pick the one that looks cheapest. If somebody's got a bloated index and builds a new one, they will want it to get used even before they drop the old one. Even if that weren't an issue, though, the fact that you can't re-parse but you can re-plan is a killer point AFAICS. It means you are borked if the statement gets re-executed after the index you picked is dropped. > From my point of view, I spent a significant amount of time making the > patch more or less match your proposed design for unique index > inference. It is discouraging to hear that you think I'm not > cooperating with community process. I'm doing my best. I think it > would be a bad idea for me to not engage with the community for an > extended period at this point. There were plenty of other issues > address by V1.3 that were not the CONFLICTING()/EXCLUDING thing that > you highlighted (or the other thing you highlighted around where to do > unique index inference). I think this gets at a point that has been bugging me and, perhaps, other people here. You often post a new patch with some fixes but without fixes for the issues that reviewers have indicated are top-of-mind for them. Sometimes, but not always, you say something like "I know this doesn't fix X but I'd like comments on other aspects of the patch". Even when you do, though, it can be difficult to overlook something that appears to be a fundamental structural problem to comment on details, and sometimes it feels like that's what we're being asked to do. When I'm reviewing, I tend to find issues more or less in proportion to the time I spend on the patch. If things that I complained about before haven't been fixed, I tend to find the same ones again, but not necessarily all that much faster than I found them the first time. So that's not efficient for me. I would not make an absolute rule that no patch should ever be re-posted without addressing every issue; I wouldn't be able to follow that rule in every case myself. But I try to follow it as often as I can, and I would suggest that you try to lean quite a bit more firmly in that direction. I think you will make more progress, and spend less time arguing with others. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Oct 25, 2014 at 8:12 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Generating index paths for the UPDATE is a waste of cycles. >> Theoretically, there could be an (a, b, c) unique index and a (c,b,a) >> unique index, and those two might have a non-equal cost to scan. But >> that almost certainly isn't going to happen in practice, since that's >> a rather questionable indexing strategy, and even when it does, you're >> going to have to insert into all the unique indexes a good proportion >> of the time anyway, making the benefits of that approach pale in >> comparison to the costs. > > I don't care whether you actually generate index-paths or not, and in > fact I suspect it makes no sense to do so. But I do care that you do > a cost comparison between the available indexes and pick the one that > looks cheapest. If somebody's got a bloated index and builds a new > one, they will want it to get used even before they drop the old one. That seems extremely narrow. I am about ready to just do what you say, by costing the index based on something like relpages, but for the record I see no point. If I see no point, and you're sure that there is a point, then there is a danger that I'll miss the point, which contributes to my giving push back. I know I said that already, but I reiterate it once more for emphasis. > Even if that weren't an issue, though, the fact that you can't > re-parse but you can re-plan is a killer point AFAICS. It means you > are borked if the statement gets re-executed after the index you > picked is dropped. That simply isn't the case. As specifically noted in comments in the patch, relcache invalidation works in such a way as to invalidate the query proper, even when only an index has been dropped. For a time during development, the semantic dependence was more directly represented by actually having extract_query_dependencies() extract the arbitrating unique index pg_class Oid as a dependency for the benefit of the plan cache, but I ultimately decided against doing anything more than noting the potential future problem (the problem that arises in a world where relcache invalidation is more selective). But, I digress: I'll have inference occur during planning during the next revision since you think that's important. >> From my point of view, I spent a significant amount of time making the >> patch more or less match your proposed design for unique index >> inference. It is discouraging to hear that you think I'm not >> cooperating with community process. I'm doing my best. I think it >> would be a bad idea for me to not engage with the community for an >> extended period at this point. There were plenty of other issues >> address by V1.3 that were not the CONFLICTING()/EXCLUDING thing that >> you highlighted (or the other thing you highlighted around where to do >> unique index inference). > > I think this gets at a point that has been bugging me and, perhaps, > other people here. You often post a new patch with some fixes but > without fixes for the issues that reviewers have indicated are > top-of-mind for them. Sometimes, but not always, you say something > like "I know this doesn't fix X but I'd like comments on other aspects > of the patch". Even when you do, though, it can be difficult to > overlook something that appears to be a fundamental structural problem > to comment on details, and sometimes it feels like that's what we're > being asked to do. I think that it's accurate to say that I asked the community to do that once, last year. I was even very explicit about it at the time. I'm surprised that you think that's the case now, though. I learned my lesson a little later: don't do that, because fellow contributors are unwilling to go along with it, even for something as important as this. As recently as a few months ago, you wanted me to go a *completely* different direction with this (i.e. attempt an UPDATE first). I'm surprised that you're emphasizing the EXCLUDED()/CONFLICTING() thing so much. Should I take it that I more or less have your confidence that the way the patch fits together at a high level is sound? Because, that's the only way I can imagine you'd think that the EXCLUDED()/CONFLICTING() thing is of such fundamental importance at this point. It is after all split out as a much smaller commit on top of the main implementation (a commit which could easily be deferred). While it's important, it is very clearly subsidiary to the overall structure of my design, which I think that there has been precious little discussion of on this thread (e.g. the whole way I use EvalPlanQual(), the general idea of a special auxiliary plan that is executed in a special way, etc). That concerns me, because I suspect that the reason there has been next to no discussion of those aspects is because they're particularly hard things to have an opinion on, and yet are the things of immediate importance. Please correct me if I have it wrong. I am in no position to demand that you or anyone else discuss any particular aspect of the patch, of course. I am just conveying my concerns. Like all of us, I very much want to get this feature into the next release of PostgreSQL. > When I'm reviewing, I tend to find issues more or > less in proportion to the time I spend on the patch. If things that I > complained about before haven't been fixed, I tend to find the same > ones again, but not necessarily all that much faster than I found them > the first time. So that's not efficient for me. I would not make an > absolute rule that no patch should ever be re-posted without > addressing every issue; I wouldn't be able to follow that rule in > every case myself. But I try to follow it as often as I can, and I > would suggest that you try to lean quite a bit more firmly in that > direction. I think you will make more progress, and spend less time > arguing with others. I'll try. But let me point out that the *very first* thing you complained about, in relation to version 1.0, was that the plan structure was funny. V1.3 cleaned that up, making a "sequential scan" always occur, and having EXPLAIN treat that as a strict implementation detail, while still attributing some aspects of the hidden, unexecuted "sequential scan" (e.g. the qual) to its parent where that makes sense. This seems like something that squarely addressed your *original* concern. A little later, you (rightly) complained about the lack of support for inheritance, and to a lesser extent updatable views. As of V1.3, I have reasonable support for both of those things. And, of course, you wanted a unique index to be inferred from a set of columns/expressions, which (most notably) v1.3 now does. Also, I incorporated feedback from Kevin on some of those same items, as well as the behavior of statement-level triggers. Furthermore, I incorporated the feedback of Simon in having way more tests, in particular, two new isolation tests, one of which illustrates the qualitatively new "MVCC violation" in more detail. In short, I think I've done a pretty good job of incorporating feedback, and where I haven't I have been quite clear about it (it certainly didn't take you long to figure it out in this most recent instance). There is always room for improvement, but in my book V1.3 made a lot more than small, incremental progress. I am surprised by your remarks suggesting that the progress of each revision is excessively gradual in addressing your concerns. AFAICT, it's just that V1.3 wasn't totally comprehensive in doing so. In reality, the main reason for that is: getting this feature to a point where it might plausibly be committed is bloody difficult, as evidenced by the fact that it took this long for someone to produce a patch. Please don't lose sight of that. -- Peter Geoghegan
On Sun, Oct 26, 2014 at 4:39 PM, Peter Geoghegan <pg@heroku.com> wrote: >> I don't care whether you actually generate index-paths or not, and in >> fact I suspect it makes no sense to do so. But I do care that you do >> a cost comparison between the available indexes and pick the one that >> looks cheapest. If somebody's got a bloated index and builds a new >> one, they will want it to get used even before they drop the old one. > > That seems extremely narrow. I am about ready to just do what you say, > by costing the index based on something like relpages, but for the > record I see no point. If I see no point, and you're sure that there > is a point, then there is a danger that I'll miss the point, which > contributes to my giving push back. I know I said that already, but I > reiterate it once more for emphasis. I don't think it's either narrow or difficult: I think you just need to apply the existing index costing function. CLUSTER does a similar calculation to figure out whether to seq-scan or sort, even though it doesn't use index paths per se, or at least I don't think it does. >> Even if that weren't an issue, though, the fact that you can't >> re-parse but you can re-plan is a killer point AFAICS. It means you >> are borked if the statement gets re-executed after the index you >> picked is dropped. > > That simply isn't the case. As specifically noted in comments in the > patch, relcache invalidation works in such a way as to invalidate the > query proper, even when only an index has been dropped. For a time > during development, the semantic dependence was more directly > represented by actually having extract_query_dependencies() extract > the arbitrating unique index pg_class Oid as a dependency for the > benefit of the plan cache, but I ultimately decided against doing > anything more than noting the potential future problem (the problem > that arises in a world where relcache invalidation is more selective). I don't know what you mean by "invalidate the query proper". Here's my chain of reasoning: If the index selection occurs in parse analysis, then it's embedded in the parse tree. If this can used a writable CTE, then the parse tree can get stored someplace. If the index is then dropped, the parse tree is no good any more. Where's the flaw in that reasoning? I haven't looked at the patch to know exactly what will happen in that case, but it seems to me like it must either not work or there must be some ugly hack to make it work. > But, I digress: I'll have inference occur during planning during the > next revision since you think that's important. Great. > I think that it's accurate to say that I asked the community to do > that once, last year. I was even very explicit about it at the time. > I'm surprised that you think that's the case now, though. I learned my > lesson a little later: don't do that, because fellow contributors are > unwilling to go along with it, even for something as important as > this. I'm just telling you how I feel. I can't vouch for it being perfectly fair or accurate, though I do strive for that. > As recently as a few months ago, you wanted me to go a *completely* > different direction with this (i.e. attempt an UPDATE first). I'm > surprised that you're emphasizing the EXCLUDED()/CONFLICTING() thing > so much. Should I take it that I more or less have your confidence > that the way the patch fits together at a high level is sound? No. Commenting on one aspect of a patch doesn't imply agreement with other aspects of the patch. Please don't put words into my mouth. I haven't reviewed this patch in detail; I've only commented on specific aspects of it as they have arisen in discussion. I may or may not someday review it in detail, but not before I'm fairly confident that the known issues raised by other community members have been addressed as thoroughly as possible. > In reality, the > main reason for that is: getting this feature to a point where it > might plausibly be committed is bloody difficult, as evidenced by the > fact that it took this long for someone to produce a patch. Please > don't lose sight of that. I'm not. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 27 October 2014 15:55, Robert Haas <robertmhaas@gmail.com> wrote: > Commenting on one aspect of a patch doesn't imply agreement with > other aspects of the patch. Please don't put words into my mouth. I > haven't reviewed this patch in detail; I've only commented on specific > aspects of it as they have arisen in discussion. I may or may not > someday review it in detail, but not before I'm fairly confident that > the known issues raised by other community members have been addressed > as thoroughly as possible. +1 -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 27, 2014 at 9:43 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 27 October 2014 15:55, Robert Haas <robertmhaas@gmail.com> wrote: > >> Commenting on one aspect of a patch doesn't imply agreement with >> other aspects of the patch. Please don't put words into my mouth. I >> haven't reviewed this patch in detail; I've only commented on specific >> aspects of it as they have arisen in discussion. I may or may not >> someday review it in detail, but not before I'm fairly confident that >> the known issues raised by other community members have been addressed >> as thoroughly as possible. > > +1 I wasn't putting words in anyone's mouth; I *don't* think that it's true that Robert thinks patches 0001-* and 0002-* are perfectly fine, and implied as much myself. I just think the *strongly* worded disapproval of the user-visible interface of 0003-* was odd; it was way out of proportion to its immediate importance to getting this patch on track. AFAICT it was the *only* feedback that I didn't act on with V1.3 (Robert's complaint about how inference happens during parse analysis was a response to V1.3). I'm not always going to be able to act on every item of feedback immediately, or I'll have my own ideas about how to handle certain things. I don't think that's all that big of a deal, since I've acted on almost all feedback. I think by far the biggest problem here is the lack of attention to the design from others. I did a lot of copy-editing to the Wiki page yesterday. There are actually few clear open items now: https://wiki.postgresql.org/wiki/UPSERT#Open_Items Some previous "open items" have been moved to here: https://wiki.postgresql.org/wiki/UPSERT#Miscellaneous_odd_properties_of_proposed_ON_CONFLICT_patch This is basically a section describing things that have not been controversial or in need of adjusting, and may well never be, but I wish we'd talk about because they're in some way novel or counter-intuitive. This is the kind of things I'd like us to discuss more. -- Peter Geoghegan
On 27 October 2014 17:44, Peter Geoghegan <pg@heroku.com> wrote: > I did a lot of copy-editing to the Wiki page yesterday. There are > actually few clear open items now: > https://wiki.postgresql.org/wiki/UPSERT#Open_Items I've read this page. Please do these things, both of which have been requested multiple times... 1. Take the specific docs that relate to the patch and put them in one place, so that everybody can read and understand and agree the behaviour of the patch. So that someone reading that can see *exactly* what is being proposed, not read through pages of other unchanged material hoping to catch the few points that really differ. 2. Replace CONFLICTING() with what I have asked for in earlier posts. The request is not repeated again, to avoid confusion. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 27, 2014 at 11:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > 1. Take the specific docs that relate to the patch and put them in one > place, so that everybody can read and understand and agree the > behaviour of the patch. So that someone reading that can see *exactly* > what is being proposed, not read through pages of other unchanged > material hoping to catch the few points that really differ. I'm afraid I don't follow. I have links to the user-visible documentation (v1.3) on the Wiki: https://wiki.postgresql.org/wiki/UPSERT#Documentation The documentation is complete. I link to every interesting page from the documentation directly from the Wiki, too. Of course, I also describe the issues in more detail for those with an interest in the implementation on the Wiki page itself (and list open issues). I have isolation tests that illustrate the new facets of visibility for READ COMMITTED, too. How, specifically, have I failed to do what you ask here? If you want to see exactly what has changed, in a differential fashion, well, that's what a diff is for. I'm not aware of any existing way of rendering to html for readability, while highlighting what is new. -- Peter Geoghegan
On Mon, Oct 27, 2014 at 1:44 PM, Peter Geoghegan <pg@heroku.com> wrote: > I think by far the biggest problem here is the > lack of attention to the design from others. I find that attitude incredible. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 27, 2014 at 1:22 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Oct 27, 2014 at 1:44 PM, Peter Geoghegan <pg@heroku.com> wrote: >> I think by far the biggest problem here is the >> lack of attention to the design from others. > > I find that attitude incredible. What I mean is that that's the thing that clearly needs scrutiny the most. That isn't an attitude - it's a technical opinion. -- Peter Geoghegan
On 27 October 2014 20:24, Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Oct 27, 2014 at 1:22 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Oct 27, 2014 at 1:44 PM, Peter Geoghegan <pg@heroku.com> wrote: >>> I think by far the biggest problem here is the >>> lack of attention to the design from others. >> >> I find that attitude incredible. > > What I mean is that that's the thing that clearly needs scrutiny the > most. That isn't an attitude - it's a technical opinion. Let's see if we can link these two thoughts. 1. You think the biggest problem is the lack of attention to the design. 2. I keep asking you to put the docs in a readable form. If you can't understand the link between those two things, I am at a loss. Good luck with the patch. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Oct 27, 2014 at 4:34 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Let's see if we can link these two thoughts. > > 1. You think the biggest problem is the lack of attention to the design. > > 2. I keep asking you to put the docs in a readable form. > > If you can't understand the link between those two things, I am at a loss. You've read the docs. Please be clearer. In what sense are they not readable? The main description of the feature appears on the INSERT reference page: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html These paragraphs were added (there is more on the INSERT reference page than mentioned here, but this is the main part): """ The optional ON CONFLICT clause specifies a path to take as an alternative to raising a uniqueness violation error. ON CONFLICT IGNORE simply avoids inserting any individual row when it is determined that a uniqueness violation error would otherwise need to be raised. ON CONFLICT UPDATE has the system take an UPDATE path in respect of such rows instead. ON CONFLICT UPDATE guarantees an atomic INSERT or UPDATE outcome. While rows may be updated, the top-level statement is still an INSERT, which is significant for the purposes of statement-level triggers and the rules system. Note that in the event of an ON CONFLICT path being taken, RETURNING returns no value in respect of any not-inserted rows. ON CONFLICT UPDATE optionally accepts a WHERE clause condition. When provided, the statement only proceeds with updating if the condition is satisfied. Otherwise, unlike a conventional UPDATE, the row is still locked for update. Note that the condition is evaluated last, after a conflict has been identified as a candidate to update. ON CONFLICT UPDATE accepts EXCLUDED expressions in both its targetlist and WHERE clause. This allows expressions (in particular, assignments) to reference rows originally proposed for insertion. Note that the effects of all per-row BEFORE INSERT triggers are carried forward. This is particularly useful for multi-insert ON CONFLICT UPDATE statements; when inserting or updating multiple rows, constants need only appear once. There are several restrictions on the ON CONFLICT UPDATE clause that do not apply to UPDATE statements. Subqueries may not appear in either the UPDATE targetlist, nor its WHERE clause (although simple multi-assignment expressions are supported). WHERE CURRENT OF cannot be used. In general, only columns in the target table, and excluded values originally proposed for insertion may be referenced. Operators and functions may be used freely, though. INSERT with an ON CONFLICT UPDATE clause is a deterministic statement. You cannot UPDATE any single row more than once. Writing such an INSERT statement will raise a cardinality violation error. Rows proposed for insertion should not duplicate each other in terms of attributes constrained by the conflict-arbitrating unique index. Note that the ordinary rules for unique indexes with regard to NULL apply analogously to whether or not an arbitrating unique index indicates if the alternative path should be taken, so multiple NULL values across a set of rows proposed for insertion are acceptable; the statement will always insert (assuming there is no unrelated error). Note that merely locking a row (by having it not satisfy the WHERE clause condition) does not count towards whether or not the row has been affected multiple times (and whether or not a cardinality violation error is raised). ON CONFLICT UPDATE requires a column specification, which is used to infer an index to limit pre-checking for duplicates to. ON CONFLICT IGNORE makes this optional. Failure to anticipate and prevent would-be uniqueness violations originating in some other unique index than the single unique index that was anticipated as the sole source of would-be uniqueness violations can result in failing to insert a row (taking the IGNORE path) when a uniqueness violation may have actually been appropriate; omitting the specification indicates a total indifference to where any would-be uniqueness violation could occur. Note that the ON CONFLICT UPDATE assignment may result in a uniqueness violation, just as with a conventional UPDATE. The rules for unique index inference are straightforward. Columns and/or expressions specified must match all the columns/expressions of some existing unique index on table_name. The order of the columns/expressions in the index definition, or whether or not the index definition specified NULLS FIRST or NULLS LAST, or the internal sort order of each column (whether DESC or ASC were specified) are all irrelevant. However, partial unique indexes are not supported as arbiters of whether an alternative ON CONFLICT path should be taken. Deferred unique constraints are also not accepted. You must have INSERT privilege on a table in order to insert into it, as well as UPDATE privilege if and only if ON CONFLICT UPDATE is specified. If a column list is specified, you only need INSERT privilege on the listed columns. Similarly, when ON CONFLICT UPDATE is specified, you only need UPDATE privilege on the column(s) that are listed to be updated, as well as SELECT privilege on any column whose values are read in the ON CONFLICT UPDATE expressions or condition. Use of the RETURNING clause requires SELECT privilege on all columns mentioned in RETURNING. If you use the query clause to insert rows from a query, you of course need to have SELECT privilege on any table or column used in the query. """ On the same page, there is commentary on each new addition to the grammar (e.g. expression_index, column_name_index, expression). There are some worked examples, too (there are now 3 ON CONFLICT related examples out of 12 total): """ Insert or update new distributors as appropriate. Assumes a unique index has been defined that constrains values appearing in the did column. Note that an EXCLUDED expression is used to reference values originally proposed for insertion: INSERT INTO distributors (did, dname) VALUES (5, 'Gizmo transglobal'), (6, 'Doohickey, inc') ON CONFLICT (did) UPDATE SETdname = EXCLUDED(dname) || ' (formerly ' || dname || ')' Insert a distributor, or do nothing for rows proposed for insertion when an existing, excluded row (a row with a matching constrained column or columns after before row insert triggers fire) exists. Assumes a unique index has been defined that constrains values appearing in the did column (although since the IGNORE variant was used, the specification of columns to infer a unique index from is not mandatory): INSERT INTO distributors (did, dname) VALUES (7, 'Doodad GmbH') ON CONFLICT (did) IGNORE Insert or update new distributors as appropriate. Assumes a unique index has been defined that constrains values appearing in the did column. WHERE clause is used to limit the rows actually updated (any existing row not updated will still be locked, though): -- Don't update any existing row if it was already renamed at some -- earlier stage INSERT INTO distributors (did, dname)VALUES (8, 'Thingamabob Distribution') ON CONFLICT (did) UPDATE SET dname = EXCLUDED(dname) || ' (formerly ' || dname|| ')' WHERE dname NOT LIKE '%(formerly %)' """ If you do not provide actionable feedback, then I cannot do anything. That's frustrating for me. Saying that the docs are not in a "readable form" is too vague for me to act on. If I'm not mistaken, they're in the same form (the same medium - sgml doc patches) that they've been for every patch ever submitted to PostgreSQL (in terms of being easy to read against the backdrop of existing docs, which you mentioned before). I have built the html docs and uploaded them myself, as an additional convenience to reviewers. What standard am I failing to meet here? I spent *hours* editing the Wiki yesterday to make the structure clearer (little new content was added), and make the information exactly current. I am bending over backwards to make the user-visible behaviors clearer, and to document the patch well, both from a user-visible perspective, and an internal perspective. I have internal docs for both implementations of value locking (#1 and #2). Then there's the value locking Wiki page. I have produced *reams* of documentation at this point, all in an effort to make things easier to understand. The user visible doc patch diff stats: ---doc/src/sgml/ddl.sgml | 23 +++doc/src/sgml/indices.sgml | 11 +-doc/src/sgml/mvcc.sgml | 43 ++++--doc/src/sgml/plpgsql.sgml | 20 ++-doc/src/sgml/postgres-fdw.sgml | 8 ++doc/src/sgml/ref/create_index.sgml | 7 +-doc/src/sgml/ref/create_rule.sgml | 6 +-doc/src/sgml/ref/create_table.sgml | 5 +-doc/src/sgml/ref/create_trigger.sgml | 5 +-doc/src/sgml/ref/create_view.sgml | 36 ++++-doc/src/sgml/ref/insert.sgml | 258 ++++++++++++++++++++++++++++++++--doc/src/sgml/ref/set_constraints.sgml| 6 +-doc/src/sgml/trigger.sgml | 46++++--13 files changed, 426 insertions(+), 48 deletions(-) #1 internal documentation stats: doc/src/sgml/indexam.sgml | 133 ++++++++++++++++++++++++++++++++++++---src/backend/access/nbtree/README | 90 +++++++++++++++++++++++++-src/backend/executor/README | 35 +++++++++++3 files changed, 247 insertions(+), 11 deletions(-) #2 internal documentation stats: ---src/backend/executor/README | 49 +++++++++++++++++++++++++++++++++++++++++++++1 file changed, 49 insertions(+) Maybe those INSERT reference paragraphs could use another pass of copy-editing, but that's not what you said - you suggested some far more significant issue. Some other issues, like exactly how triggers work with the feature are discussed on dedicated pages rather than the main INSERT reference page, which seems consistent with the existing documentation. Have I not provided a total of 4 isolation tests illustrating interesting concurrency/visibility interactions? That's a lot of isolation tests. Here is the tests commit stat: 31 files changed, 1159 insertions(+), 8 deletions(-) I really don't have any idea what you'd like me to do. Once again: please be more specific. I don't know what you mean. -- Peter Geoghegan
> --- > doc/src/sgml/ddl.sgml | 23 +++ > doc/src/sgml/indices.sgml | 11 +- > doc/src/sgml/mvcc.sgml | 43 ++++-- > doc/src/sgml/plpgsql.sgml | 20 ++- > doc/src/sgml/postgres-fdw.sgml | 8 ++ > doc/src/sgml/ref/create_index.sgml | 7 +- > doc/src/sgml/ref/create_rule.sgml | 6 +- > doc/src/sgml/ref/create_table.sgml | 5 +- > doc/src/sgml/ref/create_trigger.sgml | 5 +- > doc/src/sgml/ref/create_view.sgml | 36 ++++- > doc/src/sgml/ref/insert.sgml | 258 ++++++++++++++++++++++++++++++++-- > doc/src/sgml/ref/set_constraints.sgml | 6 +- > doc/src/sgml/trigger.sgml | 46 ++++-- > 13 files changed, 426 insertions(+), 48 deletions(-) > > #1 internal documentation stats: > > doc/src/sgml/indexam.sgml | 133 ++++++++++++++++++++++++++++++++++++--- > src/backend/access/nbtree/README | 90 +++++++++++++++++++++++++- > src/backend/executor/README | 35 +++++++++++ > 3 files changed, 247 insertions(+), 11 deletions(-) > > #2 internal documentation stats: > > --- > src/backend/executor/README | 49 +++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 49 insertions(+) Just to put that in context, here are the documentation changes from the original LATERAL commit: doc/src/sgml/keywords.sgml | 2 +-doc/src/sgml/queries.sgml | 83 +++++++++++++++++++++++++++++++++++++++++-doc/src/sgml/ref/select.sgml | 102 +++++++++++++++++++++++++++++++++++++++++++++------- Commit 0ef0b30 added data-modifying CTE docs (docs only). That looks like: doc/src/sgml/queries.sgml | 177 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++---doc/src/sgml/ref/select.sgml | 49 ++++++++++++++---2 files changed,214 insertions(+), 12 deletions(-) > Have I not provided a total of 4 isolation tests illustrating > interesting concurrency/visibility interactions? That's a lot of > isolation tests. Here is the tests commit stat: > > 31 files changed, 1159 insertions(+), 8 deletions(-) And to put the tests in context, here are the stats from the original Hot Standby commit: src/test/regress/expected/hs_standby_allowed.out | 215 ++++++++++++++++++++++++++src/test/regress/expected/hs_standby_check.out | 20 +++src/test/regress/expected/hs_standby_disallowed.out| 137 +++++++++++++++++src/test/regress/expected/hs_standby_functions.out | 40 +++++src/test/regress/pg_regress.c | 33 ++--src/test/regress/sql/hs_primary_extremes.sql | 74 +++++++++src/test/regress/sql/hs_primary_setup.sql | 25 +++src/test/regress/sql/hs_standby_allowed.sql | 121 +++++++++++++++src/test/regress/sql/hs_standby_check.sql | 16 ++src/test/regress/sql/hs_standby_disallowed.sql | 105 +++++++++++++src/test/regress/sql/hs_standby_functions.sql | 24 +++src/test/regress/standby_schedule | 21 +++ So (at least as measured by raw lines of tests), this feature is better tested than the original Hot Standby commit, and by a wide margin. Tests also serve as an explanatory tool for the feature (in particular, isolation tests can be used in this way). -- Peter Geoghegan
On Mon, Oct 27, 2014 at 5:15 PM, Peter Geoghegan <pg@heroku.com> wrote: >> Let's see if we can link these two thoughts. >> >> 1. You think the biggest problem is the lack of attention to the design. >> >> 2. I keep asking you to put the docs in a readable form. >> >> If you can't understand the link between those two things, I am at a loss. > > You've read the docs. Please be clearer. In what sense are they not > readable? The main description of the feature appears on the INSERT > reference page: > > http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html I've updated that reference page. I did a fair amount of copy-editing, but also updated the docs to describe the latest (unpublished) refinements to the syntax. Which is, as you and Robert requested, that the target and rejected-for-insertion tuples may be referenced with magical aliases in the style of OLD.* and NEW.*. I've spelt these aliases as TARGET.* and EXCLUDED.*, since OLD.* and NEW.* didn't seem to make much sense here. This requires some special processing during rewriting (which, as you probably know, is true of DML statements in general), and is certainly more invasive than what I had before, but all told isn't too bad. Basically, there is still an ExcludedExpr, but it only appears in the post-rewrite query tree, and is never created by the raw grammar or processed during parse analysis. I attach the doc patch with the relevant changes, in case you'd like a quick reference to where things are changed. I have already implemented the two things that you and Robert asked for most recently: A costing model for unique index inference, and the above syntax. I've also added IGNORE support to postgres_fdw (so you can IGNORE if and only if a unique index inference specification is omitted, just as with updatable views since V1.3). Currently, I'm working on fixing an issue with RLS that I describe in detail here: https://wiki.postgresql.org/wiki/UPSERT#RLS Once I fix that (provided it doesn't take too long), I'll publish a V1.4. AFAICT, that'll close out all of the current open issues. I hope this goes some way towards addressing your concerns. -- Peter Geoghegan
Attachment
On Wed, Nov 5, 2014 at 1:09 PM, Peter Geoghegan <pg@heroku.com> wrote: > Once I fix that (provided it doesn't take too long), I'll publish a > V1.4. AFAICT, that'll close out all of the current open issues. Attached is V1.4. As with V1.3, I continue to maintain both approaches to value locking in parallel, believing this to be the most useful direction for development to take for the time being. The consensus is for approach #2 to value locking [1], but I see no reason to deny reviewers the chance to compare both approaches. It's easy to maintain the two, as the value locking implementation is well encapsulated - The executor level stuff that has been altered in the last few revisions tends to cause very few or no conflicts when rebasing. Highlights ======= * Costing of indexes for the purposes of determining which to have arbitrate whether or not the executor takes the alternative path. So, a list of expressions is created during parse analysis, and that list is matched against existing indexes during optimization. It's usually possible to avoid the work of generating paths, because (it seems reasonable to suppose) there is usually 0 or 1 possible indexes in representative cases. If it's 0, we get an error, originating from where we now do this work -- the optimizer. * EXCLUDED.* (and TARGET.*) pseudo-aliases (compare OLD.* and NEW.* in the context of user-defined rules and conditional triggers) are visible within auxiliary UPDATE (but not parent INSERT). See the commit message for details on how that works. In short, we still have a dedicated primnode expression, ExcludedExpr, but it is not ever generated by the raw grammar (it can only be added by the during the rewriting stage of query processing). It's just a facade, but a perfectly convincing one. Note that this means that Vars can be referenced from "another RTE" in what is actually a relation scan node of the target: postgres=# explain INSERT INTO upsert values(1, 'foo') on conflict (key) update set val = excluded.val where excluded.val != 'bar'; QUERY PLAN ------------------------------------------------------------------------ Insert on upsert (cost=0.00..0.01 rows=1 width=0) -> Result (cost=0.00..0.01 rows=1 width=0) -> Conflict Update on upsert (cost=0.00..32.99 rows=1591 width=36) Filter: ((excluded.val) <> 'bar'::text) (4 rows) Here, you're seeing a "Conflict Update" scan (actually, a quasi-hidden sequential scan) on the upsert table that references a Var from the facade excluded.* table/RTE. In fact, the Var is on the target table, but read through our internal expression primnode (ExcludedExpr) so as to get access to the excluded-from-insertion tuple slot during EPQ expression evaluation for the UPDATE. * postgres_fdw support for the IGNORE variant (provided there was no unique index inference specification - just as with updatable views). * Documentation clean-up - as I mentioned, I tried to address Simon's concerns here. Also, as you'd expect, the documentation has been fixed up to reflect the new syntax. I'll need to take a pass at updating the UPSERT Wiki page soon, too. Next steps ======== AFAICT, this revision addresses all open items bar one - the RLS bug, which I could not decide on a fix for. I refer to the RLS issue described on the Wiki [2]. As I mentioned before, I'd really like to get some reviewer time on the executor level aspects of this, which are relatively new, and have received no scrutiny from anyone else that I'm aware of. This list of items is a good place to start, for those that are interested: https://wiki.postgresql.org/wiki/UPSERT#Miscellaneous_odd_properties_of_proposed_ON_CONFLICT_patch My use of the EvalPlanQual() mechanism, and the structure of the plan tree in general could really use some scrutiny too. Thanks [1] https://wiki.postgresql.org/wiki/Value_locking#.232._.22Promise.22_heap_tuples_.28Heikki_Linnakangas.29 [2] https://wiki.postgresql.org/wiki/UPSERT#RLS -- Peter Geoghegan
Attachment
On Mon, Nov 10, 2014 at 3:33 PM, Peter Geoghegan <pg@heroku.com> wrote: > * Documentation clean-up - as I mentioned, I tried to address Simon's > concerns here. Also, as you'd expect, the documentation has been fixed > up to reflect the new syntax. I'll need to take a pass at updating the > UPSERT Wiki page soon, too. I should mention that I've updated the Wiki to reflect the current, post-v1.4 state of affairs. That remains the best place to get a relatively quick overview of where things stand with open items, discussion of the patch, etc: https://wiki.postgresql.org/wiki/UPSERT -- Peter Geoghegan
On Mon, Nov 10, 2014 at 3:33 PM, Peter Geoghegan <pg@heroku.com> wrote: > Attached is V1.4. Someone mentioned to me privately that they weren't sure that the question of whether or not RETURNING only projected actually inserted tuples was the right one. Also, I think someone else mentioned this a few months back. I'd like to address this question directly sooner rather than later, and so I've added a note on the Wiki page in relation to this [1]. It's a possible area of concern at this point. Anyway, it wouldn't require much implementation effort to change the behavior so that updated tuples were also projected. In addition, we might also consider the necessity of inventing a mechanism to make apparent whether the tuple was inserted or updated. The discussion needs to happen first, though. [1] https://wiki.postgresql.org/wiki/UPSERT#RETURNING_behavior -- Peter Geoghegan
On 11/20/2014 01:52 AM, Peter Geoghegan wrote: > On Mon, Nov 10, 2014 at 3:33 PM, Peter Geoghegan <pg@heroku.com> wrote: > Also, I think someone else mentioned this a few months back. Yeah, that was me. I think we have three options. 1. Return only inserted tuples 2. Return inserted and updated tuples 3. Return inserted, updated and skipped tuples To me option 1 is surprising and less useful since I imagine in most cases where you do an upsert you do not care if the tuple was inserted or updated as long as it has the right values after the upsert, and these values is also what I would expect to be returned. The possible use case I see for option 3 is when you want the values of automatically generated columns but there is actually no work to do if another transaction had already inserted the same row (same according to the unique constraints). But this behavior even though useful in certain cases might be surprising. Andreas
On Wed, Nov 19, 2014 at 5:37 PM, Andreas Karlsson <andreas@proxel.se> wrote: > I think we have three options. > > 1. Return only inserted tuples > 2. Return inserted and updated tuples > 3. Return inserted, updated and skipped tuples > > To me option 1 is surprising and less useful since I imagine in most cases > where you do an upsert you do not care if the tuple was inserted or updated > as long as it has the right values after the upsert, and these values is > also what I would expect to be returned. I can see why you'd say that about option 1. That also seems like an argument against surfacing the distinction directly (through a dedicated hidden column or other expressing that RETURNING might reference, say). > The possible use case I see for option 3 is when you want the values of > automatically generated columns but there is actually no work to do if > another transaction had already inserted the same row (same according to the > unique constraints). But this behavior even though useful in certain cases > might be surprising. I think that 3 is out. It seems hard to justify not RETURNING anything in respect of a slot when there is a before row insert trigger that returns NULL on the one hand, but RETURNING something despite not inserting for ON CONFLICT UPDATE on the other. I think if we do this, we're also going to have to set a command tag. That could look like this: postgres=# INSERT INTO upsert values(1, 'Foo'), (2, 'Bar') ON CONFLICT (key) UPDATE SET val = EXCLUDED.val; INSERT 0 1 UPDATE 1 Or perhaps like this: postgres=# INSERT INTO upsert values(1, 'Foo'), (2, 'Bar') ON CONFLICT (key) UPDATE SET val = EXCLUDED.val; UPSERT 0 2 Maybe the latter is better, because it's less likely to break tools that currently parse the command tag. But if we went with the former command tag format, we'd have to figure out if there should always be an "UPDATE part" of INSERT command tags generally, even when there was no ON CONFLICT UPDATE clause. I guess in that case it would have to become stable/consistent across INSERTs, so we'd always have an "UPDATE part", but I'm not sure. -- Peter Geoghegan
On Wed, Nov 19, 2014 at 6:04 PM, Peter Geoghegan <pg@heroku.com> wrote: > I think that 3 is out. It seems hard to justify not RETURNING anything > in respect of a slot when there is a before row insert trigger that > returns NULL on the one hand, but RETURNING something despite not > inserting for ON CONFLICT UPDATE on the other. I mean: despite not inserting *or updating*. -- Peter Geoghegan
On Wed, 2014-11-19 at 16:52 -0800, Peter Geoghegan wrote: > Someone mentioned to me privately that they weren't sure that the > question of whether or not RETURNING only projected actually inserted > tuples was the right one. Also, I think someone else mentioned this a > few months back. I'd like to address this question directly sooner > rather than later, and so I've added a note on the Wiki page in > relation to this [1]. It's a possible area of concern at this point. I think the biggest problem with the current approach is that there is no way to know if a row was skipped by the where clause when using INSERT ON CONFLICT UPDATE ... WHERE. I am a developer of the Django ORM. Django reports to the user whether a row was inserted or updated. It is possible to know which rows were inserted by returning the primary key value. If something is returned, then it was an insert. If Django implements updated vs inserted checking this way, then if PostgreSQL adds RETURNING for update case later on, that would be a breaking change for Django. So, if it is not too hard to implement RETURNING for the update case then I think it should be done. A pseudo column informing if the result was an update or insert would then be a requirement for Django. Changing the returning behavior in later releases might cause problems due to backwards compatibility. - Anssi
On Wed, Nov 19, 2014 at 10:37 PM, Anssi Kääriäinen <anssi.kaariainen@thl.fi> wrote: > I think the biggest problem with the current approach is that there is > no way to know if a row was skipped by the where clause when using > INSERT ON CONFLICT UPDATE ... WHERE. Well, there could have always been an UPDATE in a trigger or something like that. > I am a developer of the Django ORM. Django reports to the user whether a > row was inserted or updated. It is possible to know which rows were > inserted by returning the primary key value. If something is returned, > then it was an insert. If Django implements updated vs inserted checking > this way, then if PostgreSQL adds RETURNING for update case later on, > that would be a breaking change for Django. How does that actually work at the moment? Do you use RETURNING, or look at the command tag? Would you be happy to just know that certain rows were either inserted or updated in the context of an UPSERT (and not cancelled by a BEFORE ROW INSERT or UPDATE trigger returning NULL), or do you want to specifically know if there was an insert or an update in respect of each row/slot processed? -- Peter Geoghegan
On Thu, Nov 20, 2014 at 1:42 PM, Peter Geoghegan <pg@heroku.com> wrote: > Would you be happy to just know that certain > rows were either inserted or updated in the context of an UPSERT (and > not cancelled by a BEFORE ROW INSERT or UPDATE trigger returning > NULL) Of course, having the WHERE clause in the auxiliary UPDATE not pass would also be cause to *not* return/project the not-processed row/slot (in a world where we do something with RETURNING in respect of rows actually processed by the auxiliary UPDATE). I mean, you're seeing the final version of the row when RETURNING with an UPDATE, and if the UPDATE is never evaluated, then the would-be final version (which is generally based on the TARGET tuple and EXLCUDED tuple, as processed by the UPDATE) never exists, and so clearly cannot be projected by RETURNING. This explanation a tiny bit misleading, because the rows/slots not affected by the UPDATE (or INSERT) are still *locked*, even when the UPDATE's WHERE clause does not pass - they have been processed to the extent that they were locked. This is also true of postgres_fdw in certain situations, but it seems like a very minor issue. -- Peter Geoghegan
On Thu, 2014-11-20 at 13:42 -0800, Peter Geoghegan wrote: > > I am a developer of the Django ORM. Django reports to the user whether a > > row was inserted or updated. It is possible to know which rows were > > inserted by returning the primary key value. If something is returned, > > then it was an insert. If Django implements updated vs inserted checking > > this way, then if PostgreSQL adds RETURNING for update case later on, > > that would be a breaking change for Django. > > How does that actually work at the moment? Do you use RETURNING, or > look at the command tag? Would you be happy to just know that certain > rows were either inserted or updated in the context of an UPSERT (and > not cancelled by a BEFORE ROW INSERT or UPDATE trigger returning > NULL), or do you want to specifically know if there was an insert or > an update in respect of each row/slot processed? Django uses the command tag currently to check if a row was updated. We also use RETURNING to get SERIAL values back from the database on insert. The most likely place to use this functionality in Django is Model.save(). This method is best defined as "make sure this object's state is either inserted or updated to the database by the primary key of the object". The Model.save() method needs to also report if the model was created or updated. The command tag is sufficient for this case. So, the proposed feature now has everything Django needs for Model.save(). Django might add a bulk_merge(objs) command later on. This is defined as "make sure each obj in objs is persisted to the database using the fastest way available". The INSERT ON CONFLICT UPDATE command looks excellent for this case. In this case it will be more problematic to check which rows were inserted, which update, as we need information for each primary key value separately for this case. When I think of this feature outside of Django, it seems it is completely reasonable to return database computed values on UPSERT. This requires two queries with the proposed API. My opinion is that RETURNING for the update case is better than not having it. - Anssi
On Thu, Nov 20, 2014 at 10:58 PM, Anssi Kääriäinen <anssi.kaariainen@thl.fi> wrote: > Django uses the command tag currently to check if a row was updated. We > also use RETURNING to get SERIAL values back from the database on > insert. > > The most likely place to use this functionality in Django is > Model.save(). This method is best defined as "make sure this object's > state is either inserted or updated to the database by the primary key > of the object". The Model.save() method needs to also report if the > model was created or updated. The command tag is sufficient for this > case. > > So, the proposed feature now has everything Django needs for > Model.save(). So, to be clear, it would be okay if the command tag reported number of rows *upserted*, without making any distinction between whether they were actually inserted or updated? That seems like the least invasive thing, since top level commands historically report a single number of affected rows (FWIW I wouldn't report the first OID of a single inserted tuple, as currently happens with the INSERT command tag). The ON CONFLICT IGNORE variant would still report number of rows inserted, though. RETURNING would show the resulting rows from either the insert or update for each slot processed to completion (i.e. actually inserted or updated) by the upsert, without making any user-visible distinction (you asked for an upsert, so you must not care). > Django might add a bulk_merge(objs) command later on. This is defined as > "make sure each obj in objs is persisted to the database using the > fastest way available". The INSERT ON CONFLICT UPDATE command looks > excellent for this case. In this case it will be more problematic to > check which rows were inserted, which update, as we need information for > each primary key value separately for this case. Cool. > When I think of this feature outside of Django, it seems it is > completely reasonable to return database computed values on UPSERT. This > requires two queries with the proposed API. My opinion is that RETURNING > for the update case is better than not having it. I am almost convinced that that behavior is better. I would like to hear more opinions before looking at adding the necessary changes to the implementation, though. It might be a bit questionable that ON CONFLICT IGNORE and ON CONFLICT UPDATE have different command tags, for example. What do other people think? Should RETURNING project updated tuples as well as inserted tuples, as described here? -- Peter Geoghegan
On Fri, Nov 21, 2014 at 3:38 PM, Peter Geoghegan <pg@heroku.com> wrote: > What do other people think? Should RETURNING project updated tuples as > well as inserted tuples, as described here? I think it should. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Nov 24, 2014 at 6:26 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Nov 21, 2014 at 3:38 PM, Peter Geoghegan <pg@heroku.com> wrote: >> What do other people think? Should RETURNING project updated tuples as >> well as inserted tuples, as described here? > > I think it should. Looks like the consensus is that we should have RETURNING project updated tuples too, then. I've already written the code to do this (and to report an "UPSERT" command tag), which is very straightforward. The next revision will have this behavior. However, I'm going to wait a little while longer before formally publishing a new revision. -- Peter Geoghegan
On Mon, Nov 24, 2014 at 1:03 PM, Peter Geoghegan <pg@heroku.com> wrote: > Looks like the consensus is that we should have RETURNING project > updated tuples too, then. Attached revision, v1.5, establishes this behavior (as always, there is a variant for each approach to value locking). There is a new commit with a commit message describing the new RETURNING/command tag behavior in detail, so no need to repeat it here. The documentation has been updated in these areas, too. There is also one or two tiny comment tweaks here and there, as well as a pg_proc OID collision fix in the case of the value locking approach #1 variant. My mirror of the documentation (i.e. a html build) has been updated. INSERT command documentation (for new RETURNING behavior): http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html Details of changes to command tag: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/protocol-message-formats.html I'll make a pass at the Wiki page to reflect these changes soon. -- Peter Geoghegan
Attachment
On Wed, 2014-11-26 at 16:59 -0800, Peter Geoghegan wrote: > On Mon, Nov 24, 2014 at 1:03 PM, Peter Geoghegan <pg@heroku.com> wrote: > > Looks like the consensus is that we should have RETURNING project > > updated tuples too, then. > > Attached revision, v1.5, establishes this behavior (as always, there > is a variant for each approach to value locking). There is a new > commit with a commit message describing the new RETURNING/command tag > behavior in detail, so no need to repeat it here. The documentation > has been updated in these areas, too. It seems there isn't any way to distinguish between insert and update of given row. Maybe a pseudo-column can be added so that it can be used in the returning statement: insert into foobar(id, other_col) values(2, '2') on conflict (id) update set other_col=excluded.other_col returning id,pseudo.was_updated; This would ensure that users could check for each primary key value if the row was updated or inserted. Of course, the pseudo.was_updated name should be replaced by something better. It would be nice to be able to skip updates of rows that were not changed: insert into foobar values(2, '2') on conflict (id) update set other_col=excluded.other_col where target is distinct fromexcluded; - Anssi Kääriäinen
On 12/04/2014 07:07 PM, Anssi Kääriäinen wrote: > On Wed, 2014-11-26 at 16:59 -0800, Peter Geoghegan wrote: >> On Mon, Nov 24, 2014 at 1:03 PM, Peter Geoghegan <pg@heroku.com> wrote: >>> Looks like the consensus is that we should have RETURNING project >>> updated tuples too, then. >> >> Attached revision, v1.5, establishes this behavior (as always, there >> is a variant for each approach to value locking). There is a new >> commit with a commit message describing the new RETURNING/command tag >> behavior in detail, so no need to repeat it here. The documentation >> has been updated in these areas, too. > > It seems there isn't any way to distinguish between insert and update of > given row. Maybe a pseudo-column can be added so that it can be used in > the returning statement Yes, I think that's pretty important. With a negative attno so it's treated as a "hidden" col that must be explicitly named to be shown and won't be confused with user columns. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Dec 4, 2014 at 3:04 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > Yes, I think that's pretty important. With a negative attno so it's > treated as a "hidden" col that must be explicitly named to be shown and > won't be confused with user columns. I think that the standard for adding a new system attribute ought to be enormous. The only case where a new one was added post-Postgres95 was "tableoid". I'm pretty sure that others aren't going to want to do it that way. Besides, I'm not entirely convinced that this is actually an important distinction to expose. -- Peter Geoghegan
On Thu, 2014-12-04 at 10:27 -0800, Peter Geoghegan wrote: > I think that the standard for adding a new system attribute ought to > be enormous. The only case where a new one was added post-Postgres95 > was "tableoid". I'm pretty sure that others aren't going to want to do > it that way. Besides, I'm not entirely convinced that this is actually > an important distinction to expose. For Django's use case this is a requirement. We must inform the user if the save() action created a new row or if it modified an existing one. Another way to do this would be to expose the "excluded" alias in the returning clause. All columns of the excluded alias would be null in the case of insert (especially the primary key column), and thus if a query insert into foobar values(2, '2') on conflict (id) update set other_col=excluded.other_col returning excluded.id returns a non-null value, then it was an update. - Anssi
On Thu, Dec 4, 2014 at 10:27 PM, Anssi Kääriäinen <anssi.kaariainen@thl.fi> wrote: > For Django's use case this is a requirement. We must inform the user if > the save() action created a new row or if it modified an existing one. Can you explain that in more detail, please? > Another way to do this would be to expose the "excluded" alias in the > returning clause. All columns of the excluded alias would be null in > the case of insert (especially the primary key column), and thus if a > query > insert into foobar values(2, '2') on conflict (id) update set other_col=excluded.other_col returning excluded.id > returns a non-null value, then it was an update. I don't like that idea much, TBH. Consider this: postgres=# update upsert u set key = 1 from upsert i returning key; ERROR: 42702: column reference "key" is ambiguous LINE 1: update upsert u set key = 1 from upsert i returning key; ^ So, suppose this example was actually an ON CONFLICT UPDATE query. If I similarly make the aliases in the ON CONFLICT UPDATE ("target"/"excluded") visible in the returning list, it becomes necessary to qualify every column - an ambiguity is introduced by making both aliases visible, since any non-qualified column in the RETURNING clause could be from either the "target" or "excluded" alias/RTE. This is particularly annoying for the common, simple cases. Also, when there was an update in respect of a any given slot, how, in general, can I be sure that *any* visible excluded.* attribute is not null (which you suggest as a reliable proxy for the update path having been taken)? For one thing, the unique index that arbitrates whether or not we take the "alternative path" is not restricting to covering non-nullable attributes. So does the user end up specifying system/hidden atrributes, just to make what you outline work? That seems sort of messy. -- Peter Geoghegan
On Fri, 2014-12-05 at 00:21 -0800, Peter Geoghegan wrote: > On Thu, Dec 4, 2014 at 10:27 PM, Anssi Kääriäinen > <anssi.kaariainen@thl.fi> wrote: > > For Django's use case this is a requirement. We must inform the user if > > the save() action created a new row or if it modified an existing one. > > Can you explain that in more detail, please? Django has a post_save signal. The signal provide information of the save operation. One piece of information is a created boolean flag. When set to True the operation was an insert, on False it was an update. See https://docs.djangoproject.com/en/1.7/ref/signals/#django.db.models.signals.post_save for details. The created flag is typically used to perform some related action. An example is User and UserProfile models. Each user must have an UserProfile, so post_save can be used to create an empty userprofile on creation of user. If Django is going to use the INSERT ... ON CONFLICT UPDATE variant in Django for the existing save() method, then it needs to know if the result was an UPDATE or INSERT. If we are going to use this for other operations (for example bulk merge of rows to the database), it would be very convenient to have per-row updated/created information available so that we can fire the post_save signals for the rows. If we don't have that information available, it means we can't fire signals, and no signals means we can't use the bulk merge operation internally as we have to fire the signals where that happened before. Outside of Django there are likely similar reasons to want to know if the result of an operation was a creation of a new row. The reason could be creation of related row, doing some action in application layer, or just UI message telling "object created successfully" vs "object updated successfully". - Anssi
On Thu, Dec 4, 2014 at 1:27 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Thu, Dec 4, 2014 at 3:04 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >> Yes, I think that's pretty important. With a negative attno so it's >> treated as a "hidden" col that must be explicitly named to be shown and >> won't be confused with user columns. > > I think that the standard for adding a new system attribute ought to > be enormous. The only case where a new one was added post-Postgres95 > was "tableoid". I'm pretty sure that others aren't going to want to do > it that way. +1. System attributes are a mess; a negative attribute number implies not only that the column is by default hidden from view but also that it is stored in the tuple header rather than the tuple data. Using one here, where we're not even talking about a property of the tuple per se, would be a pretty goofy solution. > Besides, I'm not entirely convinced that this is actually > an important distinction to expose. I think it's probably an important distinction, for the kinds of reasons Anssi mentions, but we should look for some method other than a system column of indicating it. Maybe there's a magic function that returns a Boolean which you can call, or maybe some special clause, as with WITH ORDINALITY. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 12/05/2014 07:59 AM, Robert Haas wrote: > I think it's probably an important distinction, for the kinds of > reasons Anssi mentions, but we should look for some method other than > a system column of indicating it. Maybe there's a magic function that > returns a Boolean which you can call, or maybe some special clause, as > with WITH ORDINALITY. I thought the point of INSERT ... ON CONFLICT update was so that you didn't have to care if it was a new row or not? If you do care, it seems like it makes more sense to do your own INSERTs and UPDATEs, as Django currently does. I wouldn't be *opposed* to having a pseudocolumn in the RETURNed stuff which let me know updated|inserted|ignored, but I also don't see it as a feature requirement for 9.5. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, Dec 5, 2014 at 10:00 AM, Josh Berkus <josh@agliodbs.com> wrote: > I thought the point of INSERT ... ON CONFLICT update was so that you > didn't have to care if it was a new row or not? > > If you do care, it seems like it makes more sense to do your own INSERTs > and UPDATEs, as Django currently does. > > I wouldn't be *opposed* to having a pseudocolumn in the RETURNed stuff > which let me know updated|inserted|ignored, but I also don't see it as a > feature requirement for 9.5. Agreed. Importantly, we won't have painted ourselves into a corner where we cannot add it later, now that RETURNING projects updates tuples, too (V1.5 established that). I'm pretty confident that it would be a mistake to try and make the inner "excluded" and "target" aliases visible, in any case, because of the annoying ambiguity it creates for the common, simple cases. I think it has to be a magic function, or something like that. I really hoped we'd be having a more technical, implementation level discussion at this point. Simon rightly emphasized the importance of getting the semantics right, and the importance of discussing that up front, but I'm concerned; that's almost all that has been discussed here so far, which is surely not balanced either. -- Peter Geoghegan
On Fri, Dec 5, 2014 at 1:07 PM, Peter Geoghegan <pg@heroku.com> wrote: > Agreed. Importantly, we won't have painted ourselves into a corner > where we cannot add it later, now that RETURNING projects updates > tuples, too (V1.5 established that). Yeah, it seems fine to postpone that to a later version, as long as we haven't painted ourselves into a corner. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Dec 5, 2014 at 1:01 AM, Anssi Kääriäinen <anssi.kaariainen@thl.fi> wrote: > If Django is going to use the INSERT ... ON CONFLICT UPDATE variant in > Django for the existing save() method, then it needs to know if the > result was an UPDATE or INSERT. If we are going to use this for other > operations (for example bulk merge of rows to the database), it would be > very convenient to have per-row updated/created information available so > that we can fire the post_save signals for the rows. If we don't have > that information available, it means we can't fire signals, and no > signals means we can't use the bulk merge operation internally as we > have to fire the signals where that happened before. > > Outside of Django there are likely similar reasons to want to know if > the result of an operation was a creation of a new row. The reason could > be creation of related row, doing some action in application layer, or > just UI message telling "object created successfully" vs "object updated > successfully". It probably isn't ideal, but you'd at least be able to do something with row level triggers in the absence of a standard way of directly telling if an insert or update was performed. -- Peter Geoghegan
On Fri, 2014-12-05 at 10:00 -0800, Josh Berkus wrote: > I thought the point of INSERT ... ON CONFLICT update was so that you > didn't have to care if it was a new row or not? > > If you do care, it seems like it makes more sense to do your own INSERTs > and UPDATEs, as Django currently does. Django tries to update the object if it already exists in the database. If it doesn't, then Django does an insert. This is suboptimal from concurrency standpoint, and does two round trips to the database instead of just one. For Django, both insert and update are OK when saving an object to the database, but Django needs to know which one was done. I too agree that this doesn't need to be handled in the first version of the patch. - Anssi
Attached revision, v1.6, slightly tweaks the ordering of per-statement trigger execution. The ordering is now explicitly documented (the html mirror has been updated: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html). As always, there is a variant for each approach to value locking. This revision fixes bitrot that developed when the patchset was applied on master's tip, and also cleans up comments regarding how the parent insert carries auxiliary/child state through all stages of query processing. That should structure be clearer now, including how setrefs.c has the auxiliary/child ModifyTable use the same resultRelation as its parent. -- Peter Geoghegan
Attachment
On Mon, Dec 8, 2014 at 8:16 PM, Peter Geoghegan <pg@heroku.com> wrote: > Attached revision, v1.6, slightly tweaks the ordering of per-statement > trigger execution. Right now, there is no way for a before row insert/update trigger to determine whether it was called as part of an INSERT ... ON CONFLICT UPDATE or not. It's also not possible for a DO INSTEAD trigger on a view (a before row insert trigger) to determine that it was called specifically due to an INSERT...IGNORE (which I think ought to imply that any corresponding, "redirected" insertion into a table should also use IGNORE....that's at least going to be something that a certain number of apps will need to be made robust against). The question is: Do we want to expose this distinction to triggers? The natural way to do so would probably be to add TG_SPECULATIVE special variable to plpgsql (and equivalent variables in other PLs). This text variable would be either "UPSERT" or "IGNORE"; it would be NULL when it was not applicable (e.g. with traditional INSERTs). How do people feel about this? Is it important to include this in our initial cut of the feature? I thought that I'd avoid that kind of thing until prompted to address it by others, since it probably won't end up being a common concern, but I'd like to hear a few opinions. -- Peter Geoghegan
On Thu, Dec 11, 2014 at 1:11 AM, Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Dec 8, 2014 at 8:16 PM, Peter Geoghegan <pg@heroku.com> wrote: >> Attached revision, v1.6, slightly tweaks the ordering of per-statement >> trigger execution. > > Right now, there is no way for a before row insert/update trigger to > determine whether it was called as part of an INSERT ... ON CONFLICT > UPDATE or not. It's also not possible for a DO INSTEAD trigger on a > view (a before row insert trigger) to determine that it was called > specifically due to an INSERT...IGNORE (which I think ought to imply > that any corresponding, "redirected" insertion into a table should > also use IGNORE....that's at least going to be something that a > certain number of apps will need to be made robust against). > > The question is: Do we want to expose this distinction to triggers? > The natural way to do so would probably be to add TG_SPECULATIVE > special variable to plpgsql (and equivalent variables in other PLs). > This text variable would be either "UPSERT" or "IGNORE"; it would be > NULL when it was not applicable (e.g. with traditional INSERTs). > > How do people feel about this? Is it important to include this in our > initial cut of the feature? I thought that I'd avoid that kind of > thing until prompted to address it by others, since it probably won't > end up being a common concern, but I'd like to hear a few opinions. It's probably something we should add, but there's enough to do getting the basic feature working first. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Dec 12, 2014 at 6:39 AM, Robert Haas <robertmhaas@gmail.com> wrote: > It's probably something we should add, but there's enough to do > getting the basic feature working first. Moving this patch to CF 2014-12 as work is still going on. -- Michael
On Mon, Dec 8, 2014 at 8:16 PM, Peter Geoghegan <pg@heroku.com> wrote:
Attached revision, v1.6, slightly tweaks the ordering of per-statement
trigger execution. The ordering is now explicitly documented (the html
mirror has been updated:
http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/trigger-definition.html).
As always, there is a variant for each approach to value locking.
This revision fixes bitrot that developed when the patchset was
applied on master's tip, and also cleans up comments regarding how the
parent insert carries auxiliary/child state through all stages of
query processing. That should structure be clearer now, including how
setrefs.c has the auxiliary/child ModifyTable use the same
resultRelation as its parent.
If I build either option of the patch under MinGW, I get an error in the grammar files related to the IGNORE reserved word.
$ (./configure --host=x86_64-w64-mingw32 --without-zlib && make && make check) > /dev/null
In file included from ../../../src/include/parser/gramparse.h:29:0,
from gram.y:59:
../../../src/include/parser/gram.h:207:6: error: expected identifier before numeric constant
In file included from gram.y:14366:0:
I don't get this problem on Linux.
The build chain seems to meet the specified minimum:
flex.exe 2.5.35
bison (GNU Bison) 2.4.2
This is perl, v5.8.8 built for msys-64int
It seems like IGNORE is getting replaced by the preprocessor with something else, but I don't know how to get my hands on the intermediate file after the preprocessor has done its thing.
Also, in both Linux and MinGW under option 1 patch I get an OID conflict on OID 3261.
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > It seems like IGNORE is getting replaced by the preprocessor with something > else, but I don't know how to get my hands on the intermediate file after > the preprocessor has done its thing. Maybe IGNORE is defined as a macro in MinGW? Try s/IGNORE/IGNORE_P/g throughout the patch. regards, tom lane
On Mon, Dec 15, 2014 at 4:22 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > Also, in both Linux and MinGW under option 1 patch I get an OID conflict on > OID 3261. I'll take a pass at fixing this bitrot soon. I'll follow Tom's advice about macro collisions on MinGW while I'm at it, since his explanation seems plausible. -- Peter Geoghegan
On Mon, Dec 15, 2014 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Janes <jeff.janes@gmail.com> writes: >> It seems like IGNORE is getting replaced by the preprocessor with something >> else, but I don't know how to get my hands on the intermediate file after >> the preprocessor has done its thing. > > Maybe IGNORE is defined as a macro in MinGW? > Try s/IGNORE/IGNORE_P/g throughout the patch. BTW, the gcc -E flag does this. So figure out what exact arguments MinGW's gcc is passed in the ordinary course of compiling gram.c, and prepend "-E" to the list of existing flags while manually executing gcc -- that should let you know exactly what's happening here. -- Peter Geoghegan
On Mon, Dec 15, 2014 at 4:59 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Dec 15, 2014 at 4:22 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> Also, in both Linux and MinGW under option 1 patch I get an OID conflict on >> OID 3261. > > I'll take a pass at fixing this bitrot soon. I'll follow Tom's advice > about macro collisions on MinGW while I'm at it, since his explanation > seems plausible. Attached pair of revised patch sets fix the OID collision, and presumably fix the MinGW issue (because IGNORE_P is now used as a token name). It also polishes approach #2 to value locking in a few places (e.g. better comments). Finally, both patches have a minor buglet around EXPLAIN ANALYZE output fixed -- the output now indicates if tuples are pulled up from auxiliary update nodes. -- Peter Geoghegan
Attachment
On Mon, Dec 15, 2014 at 5:05 PM, Peter Geoghegan <pg@heroku.com> wrote:
BTW, the gcc -E flag does this. So figure out what exact arguments>
> Maybe IGNORE is defined as a macro in MinGW?
> Try s/IGNORE/IGNORE_P/g throughout the patch.
MinGW's gcc is passed in the ordinary course of compiling gram.c, and
prepend "-E" to the list of existing flags while manually executing
gcc -- that should let you know exactly what's happening here.
Yep, I tried that trick and had decided it didn't work in MinGW. But I think it was a user error--I must have somehow broken up the build tree and 'make' didn't detect the problem. Now I see that IGNORE is getting turned to 0.
Your new version 1.7 of the patches fixes that issue, as well as the OID conflict.
Thanks,
Jeff
On Tue, Dec 16, 2014 at 11:08 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > Your new version 1.7 of the patches fixes that issue, as well as the OID > conflict. Good. You're probably aware that I maintain a stress testing suite for the patch here: https://github.com/petergeoghegan/upsert In the past, you've had a lot of success with coming up with stress tests that find bugs. Maybe you can come up with some improvements to the suite, if you'd care to test the patch. I can authorize your Github account to push code to that repo, if you're interested. -- Peter Geoghegan
On 17/12/14 10:11, Peter Geoghegan wrote: > On Tue, Dec 16, 2014 at 11:08 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >> Your new version 1.7 of the patches fixes that issue, as well as the OID >> conflict. > Good. > > You're probably aware that I maintain a stress testing suite for the > patch here: https://github.com/petergeoghegan/upsert > > In the past, you've had a lot of success with coming up with stress > tests that find bugs. Maybe you can come up with some improvements to > the suite, if you'd care to test the patch. I can authorize your > Github account to push code to that repo, if you're interested. Yeah! I have just released a prototype software (not related to pg): I'm going to tell them to treat it with extreme suspicion, no matter how much they may respect the developer (me)! Though like Pg, it is critical that it records data with reliability. Also, both need testing to try and detect intermittent errors (I already found one myself in the prototype - fortunately, not so critical it needs to be fixed in the prototype, but would have to be eliminated from the production version!). So I think it really great to encourage people to come up with demanding tests, especially automated stress testing for pg. Cheers, Gavin (Who wishes he had the time & experience to contribute to pg.)
It looks like we are close to reaching consensus on the syntax. Phew! Thanks for maintaining the wiki pages and the documentation. All of the below is based on those, I haven't looked at the patch itself yet. The one thing that I still feel uneasy about is the Unique Index Inference thing. Per the syntax example from the wiki page, the UPSERT statement looks like this: INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT (key) IGNORE; With ON CONFLICT IGNORE, the list of key columns can also be left out: INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT IGNORE; The documentation says that: > Omitting the specification indicates a total indifference to where > any would-be uniqueness violation could occur, which isn't always > appropriate; at times, it may be desirable for ON CONFLICT IGNORE to > not suppress a duplicate violation within an index where that isn't > explicitly anticipated. Note that ON CONFLICT UPDATE assignment may > result in a uniqueness violation, just as with a conventional > UPDATE. Some questions: 1. Does that mean that if you leave out the key columns, the insertion is IGNOREd if it violates *any* unique key constraint? 2. If you do specify the key columns, then the IGNORE path is taken only if the insertion violates a unique key constraint on those particular columns. Otherwise an error is thrown. Right? Now, let's imagine a table like this: CREATE TABLE persons ( username text unique, real_name text unique, data text ); Is there any way to specify both of those constraints, so that the insertion is IGNOREd if it violates either one of them? If you try to do: INSERT INTO persons(username, real_name, data) VALUES('foobar', 'foo bar') ON CONFLICT (username, real_name) IGNORE; It will fail because there is no unique index on (username, real_name). In this particular case, you could leave out the specification, but if there was a third constraint that you're not expecting to conflict with, you would want violations of that constraint to still throw an error. And you can't leave out the specification with ON CONFLICT UPDATE anyway. 3. Why is the specification required with ON CONFLICT UPDATE, but not with ON CONFLICT IGNORE? 4. What happens if there are multiple unique indexes with identical columns, and you give those columns in the inference specification? Doesn't matter which index you use, I guess, if they're all identical, but see next question. 5. What if there are multiple unique indexes with the same columns, but different operator classes? 6. Why are partial unique indexes not supported as arbitrators? - Heikki
On 12/17/2014 01:12 PM, Heikki Linnakangas wrote: > 3. Why is the specification required with ON CONFLICT UPDATE, but not > with ON CONFLICT IGNORE? Well, UPDATE has to know which row to lock, no? IGNORE does not. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Wed, Dec 17, 2014 at 1:12 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > It looks like we are close to reaching consensus on the syntax. Phew! Thanks > for maintaining the wiki pages and the documentation. All of the below is > based on those, I haven't looked at the patch itself yet. Great, thanks! Yes, I am relieved that we appeared to have agreed on a syntax. > The one thing that I still feel uneasy about is the Unique Index Inference > thing. > The documentation says that: > >> Omitting the specification indicates a total indifference to where >> any would-be uniqueness violation could occur, which isn't always >> appropriate; > Some questions: > > 1. Does that mean that if you leave out the key columns, the insertion is > IGNOREd if it violates *any* unique key constraint? Yes. This is particularly important for the implementation of things like IGNORE's updatable view support. More generally, for various ETL use cases it's possible to imagine the user simply not caring. > 2. If you do specify the key columns, then the IGNORE path is taken only if > the insertion violates a unique key constraint on those particular columns. > Otherwise an error is thrown. Right? That's right. > Now, let's imagine a table like this: > > CREATE TABLE persons ( > username text unique, > real_name text unique, > data text > ); > > Is there any way to specify both of those constraints, so that the insertion > is IGNOREd if it violates either one of them? If you try to do: > > INSERT INTO persons(username, real_name, data) > VALUES('foobar', 'foo bar') > ON CONFLICT (username, real_name) IGNORE; > > It will fail because there is no unique index on (username, real_name). In > this particular case, you could leave out the specification, but if there > was a third constraint that you're not expecting to conflict with, you would > want violations of that constraint to still throw an error. And you can't > leave out the specification with ON CONFLICT UPDATE anyway. Good point. For the IGNORE case: I guess the syntax just isn't that flexible. I agree that that isn't ideal. For the UPDATE case: Suppose your example was an UPDATE where we simply assigned the excluded.data value to the data column in the auxiliary UPDATE's targetlist. What would the user really be asking for with that command, at a really high level? It seems like they might actually want to run two UPSERT commands (one for username, the other for real_name), or rethink their indexing strategy - in particular, whether it's appropriate that there isn't a composite unique constraint on (username, real_name). Now, suppose that by accident or by convention it will always be possible for a composite unique index to be built on (username, real_name) - no dup violations would be raised if it was attempted, but it just hasn't been and won't be. In other words, it's generally safe to actually pretend that there is one. Then, surely it doesn't matter if the user picks one or the other unique index. It'll all work out when the user assigns to both in the UPDATE targetlist, because of the assumed convention that I think is implied by the example. If the convention is violated, at least you get a dup violation letting you know (iff you bothered to assign). But I wouldn't like to encourage that pattern. I think that the long and the short of it is that you really ought to have one unique index as an arbiter in mind when writing a DML statement for the UPDATE variant. Relying on this type of convention is possible, I suppose, but ill-advised. > 3. Why is the specification required with ON CONFLICT UPDATE, but not with > ON CONFLICT IGNORE? That was a fairly recent decision, taken mainly to keep Kevin happy -- although TBH I don't recall that he was particularly insistent on that restriction. I could still go either way on that question. The idea, as I mentioned, is that it's legitimate to not care where a dup violation might occur for certain ETL use cases. For UPSERT, though, the only argument for not making it mandatory is that that's something extra to type, and lazy people would prefer not to bother. This is because we assume that the first dup violation is the only possible one without the unique index inference clause. If we don't have the index expressions with which to infer an arbiter unique index (as with MySQL's ON DUPLICATE KEY UPDATE), you'd better be sure that you accounted for all possible sources of would-be duplicate violations - otherwise a random row will be updated! That isn't a fantastic argument for not making a unique index inference clause mandatory, but it might be an okay one. > 4. What happens if there are multiple unique indexes with identical columns, > and you give those columns in the inference specification? Doesn't matter > which index you use, I guess, if they're all identical, but see next > question. It doesn't really matter which one you pick, but right now, at the urging of Robert, we cost the list of candidates and pick the cheapest iff there is more than one [1] (and error if there are none). This is roughly similar to costing of indexes for CLUSTER, and occurs during optimization (only parse analysis of the expressions associated with the unique index inference clause occurs during parse analysis - indexes are looked up and matched in the optimizer). > 5. What if there are multiple unique indexes with the same columns, but > different operator classes? I thought about that. I am reusing a little bit of the CREATE INDEX infrastructure for raw parsing, and for a small amount of parse analysis (conveniently, this makes the command reject things like aggregate functions with no additional code - the error messages only mention "index expressions", so I believe that's fine). This could include an opclass specification, but right now non-default opclasses are rejected during extra steps in parse analysis, for no particular reason. I could easily have the unique index inference specification accept a named opclass, if you thought that was important, and you thought naming a non-default opclass by name was a good SQL interface. It would take only a little effort to support non-default opclasses. > 6. Why are partial unique indexes not supported as arbitrators? Robert and I discussed this quite a bit -- it was the argument for being able to name a unique index by name (not that I'm very happy with that idea or anything) [2]. Basically, dealing with the possible behaviors with before row insert triggers might in general greatly complicate the implementation, even though the issues we'd then be protected against would seldom arise. Robert seemed to think that we could revisit this in a future version [3]. Note that IGNORE will still IGNORE any partial unique index -- it just won't accept one as the sole arbiter of whether or not the IGNORE path should be taken (so it's really the inference specification syntax that doesn't accept partial unique indexes, just as it doesn't accept updatable views, exclusion constraints, and inheritance parents where the semantics are similarly iffy -- that's both the reason for and the mechanism by which ON CONFLICT UPDATE does not support these things). [1] http://www.postgresql.org/message-id/CAM3SWZQz+jYkwfuZvcSf0qtpa2QiY+8NGNcHjfWgz3DDzRfzEg@mail.gmail.com [2] http://www.postgresql.org/message-id/CAM3SWZQ8tDPdjiwj_FW4AO8gEvpyiixwBE67OVQuufPJ+y1e1g@mail.gmail.com [3] http://www.postgresql.org/message-id/CA+TgmoZgLgY2PBAMTY3T1jpYXAvNL-w=T6o+6pMqrVR+Vn-iyg@mail.gmail.com -- Peter Geoghegan
On 12/18/2014 01:02 AM, Peter Geoghegan wrote: > On Wed, Dec 17, 2014 at 1:12 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> Now, let's imagine a table like this: >> >> CREATE TABLE persons ( >> username text unique, >> real_name text unique, >> data text >> ); >> >> Is there any way to specify both of those constraints, so that the insertion >> is IGNOREd if it violates either one of them? If you try to do: >> >> INSERT INTO persons(username, real_name, data) >> VALUES('foobar', 'foo bar') >> ON CONFLICT (username, real_name) IGNORE; >> >> It will fail because there is no unique index on (username, real_name). In >> this particular case, you could leave out the specification, but if there >> was a third constraint that you're not expecting to conflict with, you would >> want violations of that constraint to still throw an error. And you can't >> leave out the specification with ON CONFLICT UPDATE anyway. > > Good point. > > For the IGNORE case: I guess the syntax just isn't that flexible. I > agree that that isn't ideal. It should be simple to allow multiple key specifications: INSERT INTO persons (username, real_name, data) VALUES('foobar', 'foo bar') ON CONFLICT (username), (real_name) IGNORE; It's a rather niche use case, but might as well support it for the sake of completeness. > For the UPDATE case: Suppose your example was an UPDATE where we > simply assigned the excluded.data value to the data column in the > auxiliary UPDATE's targetlist. What would the user really be asking > for with that command, at a really high level? It seems like they > might actually want to run two UPSERT commands (one for username, the > other for real_name), or rethink their indexing strategy - in > particular, whether it's appropriate that there isn't a composite > unique constraint on (username, real_name). > > Now, suppose that by accident or by convention it will always be > possible for a composite unique index to be built on (username, > real_name) - no dup violations would be raised if it was attempted, > but it just hasn't been and won't be. In other words, it's generally > safe to actually pretend that there is one. Then, surely it doesn't > matter if the user picks one or the other unique index. It'll all work > out when the user assigns to both in the UPDATE targetlist, because of > the assumed convention that I think is implied by the example. If the > convention is violated, at least you get a dup violation letting you > know (iff you bothered to assign). But I wouldn't like to encourage > that pattern. > > I think that the long and the short of it is that you really ought to > have one unique index as an arbiter in mind when writing a DML > statement for the UPDATE variant. Relying on this type of convention > is possible, I suppose, but ill-advised. Another thought is that you might want to specify a different action depending on which constraint is violated: INSERT INTO persons (username, real_name, data) VALUES('foobar', 'foo bar') ON CONFLICT (username) IGNORE ON CONFLICT (real_name) UPDATE ...; Although that leaves the question of what to do if both are violated. Perhaps: INSERT INTO persons (username, real_name, data) VALUES('foobar', 'foo bar') ON CONFLICT (username, real_name) IGNORE ON CONFLICT (real_name) UPDATE username = excluded.username; ON CONFLICT (username) UPDATE real_name = excluded.real_name; >> 5. What if there are multiple unique indexes with the same columns, but >> different operator classes? > > I thought about that. I am reusing a little bit of the CREATE INDEX > infrastructure for raw parsing, and for a small amount of parse > analysis (conveniently, this makes the command reject things like > aggregate functions with no additional code - the error messages only > mention "index expressions", so I believe that's fine). This could > include an opclass specification, but right now non-default opclasses > are rejected during extra steps in parse analysis, for no particular > reason. > > I could easily have the unique index inference specification accept a > named opclass, if you thought that was important, and you thought > naming a non-default opclass by name was a good SQL interface. It > would take only a little effort to support non-default opclasses. It's a little weird to mention an opclass by name. It's similar to naming an index by name, really. How about naming the operator? For an exclusion constraint, that would be natural, as the syntax to create an exclusion constraint in the first place is "EXCLUDE USING gist (c WITH &&)" Naming the index by columns makes sense in most cases, and I don't like specifying the index's name, but how about allowing naming a constraint? Indexes are just an implementation detail, but constraints are not. Unique and exclusion constraints are always backed by an index, so there is little difference in practice, but I would feel much more comfortable mentioning constraints by name than indexes. Most people would list the columns, but if there is a really bizarre constraint, with non-default opclasses, or an exclusion constraint, it's probably been given a name that you could use. In theory, with the promise tuple approach to locking, you don't necessarily even need an index to back up the constraint. You could just do a sequential scan of the whole table to see if there are any conflicting rows, then insert the row, and perform another scan to see if any conflicting rows appeared in the meantime. Performance would suck, and there is no guarantee that another backend doesn't do a regular INSERT into to the table that violates the imaginary constraint, so this is pretty useless in practice. So probably better to not allow it. - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > INSERT INTO persons (username, real_name, data) > VALUES('foobar', 'foo bar') > ON CONFLICT (username), (real_name) IGNORE; > INSERT INTO persons (username, real_name, data) > VALUES('foobar', 'foo bar') > ON CONFLICT (username) IGNORE > ON CONFLICT (real_name) UPDATE ...; > INSERT INTO persons (username, real_name, data) > VALUES('foobar', 'foo bar') > ON CONFLICT (username, real_name) IGNORE > ON CONFLICT (real_name) UPDATE username = excluded.username; > ON CONFLICT (username) UPDATE real_name = excluded.real_name; I like all of these suggestions, except that I think they reflect a couple things about the syntax which was never settled[1]. First, Robert suggested using DUPLICATE instead of CONFLICT, which I think it clearer. So the above would become: INSERT INTO persons (username, real_name, data) VALUES('foobar', 'foo bar') ON DUPLICATE (username), (real_name) IGNORE; INSERT INTO persons (username, real_name, data) VALUES('foobar', 'foo bar') ON DUPLICATE (username) IGNORE ON DUPLICATE (real_name) UPDATE ...; INSERT INTO persons (username, real_name, data) VALUES('foobar', 'foo bar') ON DUPLICATE (username, real_name) IGNORE ON DUPLICATE (real_name) UPDATE username = excluded.username; ON DUPLICATE (username) UPDATE real_name = excluded.real_name; Second, he suggested a shorthand way of specifying that all the values from the failed INSERT should be used for the UPDATE: INSERT INTO persons (username, real_name, data) VALUES('foobar', 'foo bar', 'baz') ON DUPLICATE (username) UPDATE; I think the first point got lost in the discussion of the second one. I don't think either point was ever really settled beyond Robert and I preferring ON DUPLICATE versus Peter preferring ON CONFLICT. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] http://www.postgresql.org/message-id/CA+TgmoZN=2AJKi1n4Jz5BkmYi8r_CPUDW+DtoppmTeLVmsOoqw@mail.gmail.com
On 12/18/2014 05:46 PM, Kevin Grittner wrote: > I don't think either point was ever really settled beyond Robert > and I preferring ON DUPLICATE versus Peter preferring ON CONFLICT. I also prefer ON CONFLICT, because that makes more sense when you consider exclusion constraints, which I'm still hoping that this would support. If not immediately, at least in the future. - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 12/18/2014 05:46 PM, Kevin Grittner wrote: >> I don't think either point was ever really settled beyond Robert >> and I preferring ON DUPLICATE versus Peter preferring ON CONFLICT. > > I also prefer ON CONFLICT, because that makes more sense when you > consider exclusion constraints, which I'm still hoping that this would > support. If not immediately, at least in the future. If you think this can be made to work without a UNIQUE btree index, that is a persuasive point in favor of ON CONFLICT. I had missed (or forgotten) that we thought this could work without a UNIQUE btree index as the basis of detecting when to resort to an UPDATE. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Dec 15, 2014 at 11:06 PM, Peter Geoghegan <pg@heroku.com> wrote:
Attached pair of revised patch sets fix the OID collision, andOn Mon, Dec 15, 2014 at 4:59 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Dec 15, 2014 at 4:22 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> Also, in both Linux and MinGW under option 1 patch I get an OID conflict on
>> OID 3261.
>
> I'll take a pass at fixing this bitrot soon. I'll follow Tom's advice
> about macro collisions on MinGW while I'm at it, since his explanation
> seems plausible.
presumably fix the MinGW issue (because IGNORE_P is now used as a
token name). It also polishes approach #2 to value locking in a few
places (e.g. better comments). Finally, both patches have a minor
buglet around EXPLAIN ANALYZE output fixed -- the output now indicates
if tuples are pulled up from auxiliary update nodes.
I naively tried this in vallock1 patch:
create table foo(index int, count int);
create unique index on foo(index);
insert into foo (index, count) values (0,1) on conflict (index) update set count=foo.count + 1 returning foo.count;
insert into foo (index, count) values (0,1) on conflict (index) update set count=foo.count + 1 returning foo.count;
insert into foo (index, count) values (0,1) on conflict (index) update set count=foo.count + 1 returning foo.count;
insert into foo (index, count) values (0,1) on conflict (index) update set count=foo.count + 1 returning foo.count;
After actually reading the documentation more closely, I decided this should be an error because "foo" is not a valid table alias in the "update set" expression. Instead of being a parsing/planning error, this executes and the foo.count on the RHS of the assignment always evaluates as zero (even on subsequent invocations when TARGET.count is 1).
If I switch to a text type, then I get seg faults under the same condition:
create table foo(index int, count text);
create unique index on foo(index);
insert into foo (index, count) values (0,'start ') on conflict (index) update set count=foo.count||' bar' returning count;
insert into foo (index, count) values (0,'start ') on conflict (index) update set count=foo.count||' bar' returning count;
<boom>
#0 pg_detoast_datum_packed (datum=0x0) at fmgr.c:2270
#1 0x000000000074fb7a in textcat (fcinfo=0x1e67a78) at varlena.c:662
#2 0x00000000005a63a5 in ExecMakeFunctionResultNoSets (fcache=0x1e67a08, econtext=0x1e67848, isNull=0x1e68b11 "", isDone=<value optimized out>)
at execQual.c:2026
#3 0x00000000005a2353 in ExecTargetList (projInfo=<value optimized out>, isDone=0x7fffa7fa346c) at execQual.c:5358
#4 ExecProject (projInfo=<value optimized out>, isDone=0x7fffa7fa346c) at execQual.c:5573
#5 0x00000000005a86c2 in ExecScan (node=0x1e67738, accessMtd=0x5baf00 <SeqNext>, recheckMtd=0x5bad60 <SeqRecheck>) at execScan.c:207
#6 0x00000000005a1918 in ExecProcNode (node=0x1e67738) at execProcnode.c:406
#7 0x000000000059ef32 in EvalPlanQualNext (epqstate=<value optimized out>) at execMain.c:2380
#8 0x00000000005b8fcd in ExecLockUpdateTuple (node=0x1e5f750) at nodeModifyTable.c:1098
#9 ExecInsert (node=0x1e5f750) at nodeModifyTable.c:372
#10 ExecModifyTable (node=0x1e5f750) at nodeModifyTable.c:1396
#11 0x00000000005a1958 in ExecProcNode (node=0x1e5f750) at execProcnode.c:383
#12 0x00000000005a0642 in ExecutePlan (queryDesc=0x1dd0908, direction=<value optimized out>, count=0) at execMain.c:1515
#13 standard_ExecutorRun (queryDesc=0x1dd0908, direction=<value optimized out>, count=0) at execMain.c:308
#14 0x00007f601416b9cb in pgss_ExecutorRun (queryDesc=0x1dd0908, direction=ForwardScanDirection, count=0) at pg_stat_statements.c:874
#15 0x000000000069385f in ProcessQuery (plan=0x1e47df0,
....
So I think there needs to be some kind of logic to de-recognize the table alias "foo".
Once I rewrote the query to use TARGET and EXCLUDED correctly, I've put this through an adaptation of my usual torture test, and it ran fine until wraparound shutdown. I'll poke at it more later.
Cheers,
Jeff
On Thu, Dec 18, 2014 at 9:20 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > After actually reading the documentation more closely, I decided this should > be an error because "foo" is not a valid table alias in the "update set" > expression. Instead of being a parsing/planning error, this executes and > the foo.count on the RHS of the assignment always evaluates as zero (even on > subsequent invocations when TARGET.count is 1). > > If I switch to a text type, then I get seg faults under the same condition: > So I think there needs to be some kind of logic to de-recognize the table > alias "foo". > > Once I rewrote the query to use TARGET and EXCLUDED correctly, I've put this > through an adaptation of my usual torture test, and it ran fine until > wraparound shutdown. I'll poke at it more later. Oops. I agree with your diagnosis, and will circle around to fix that bug in the next revision by, as you say, simply rejecting the query if it doesn't use the two standard aliases. Thanks for testing! -- Peter Geoghegan
On Thu, Dec 18, 2014 at 7:51 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > On 12/18/2014 05:46 PM, Kevin Grittner wrote: >> >> I don't think either point was ever really settled beyond Robert >> and I preferring ON DUPLICATE versus Peter preferring ON CONFLICT. > > > I also prefer ON CONFLICT, because that makes more sense when you consider > exclusion constraints, which I'm still hoping that this would support. If > not immediately, at least in the future. This was why I changed the spelling to ON CONFLICT. It also doesn't hurt that that spelling is dissimilar to MySQL's syntax, IMV, because there are plenty of things to dislike about ON DUPLICATE KEY UPDATE, and I think a veneer of compatibility is inappropriate - this syntax is both considerably more flexible and considerably safer. -- Peter Geoghegan
On Thu, Dec 18, 2014 at 6:59 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: >> Good point. >> >> For the IGNORE case: I guess the syntax just isn't that flexible. I >> agree that that isn't ideal. > > > It should be simple to allow multiple key specifications: > > INSERT INTO persons (username, real_name, data) > VALUES('foobar', 'foo bar') > ON CONFLICT (username), (real_name) IGNORE; > > It's a rather niche use case, but might as well support it for the sake of > completeness. I guess that wouldn't be very hard to implement, and perhaps we should do so soon. I am reluctant to let scope creep too far, though. As you mentioned, this is a niche use case. >> I think that the long and the short of it is that you really ought to >> have one unique index as an arbiter in mind when writing a DML >> statement for the UPDATE variant. Relying on this type of convention >> is possible, I suppose, but ill-advised. > > Another thought is that you might want to specify a different action > depending on which constraint is violated: > > INSERT INTO persons (username, real_name, data) > VALUES('foobar', 'foo bar') > ON CONFLICT (username) IGNORE > ON CONFLICT (real_name) UPDATE ...; > > Although that leaves the question of what to do if both are violated. > Perhaps: > > INSERT INTO persons (username, real_name, data) > VALUES('foobar', 'foo bar') > ON CONFLICT (username, real_name) IGNORE > ON CONFLICT (real_name) UPDATE username = excluded.username; > ON CONFLICT (username) UPDATE real_name = excluded.real_name; I think that there might be a place for that, but I'd particularly like to avoid figuring this out now - this suggestion is a complicated new direction for the patch, and it's not as if adding this kind of flexibility is precluded by not allowing it in the first version - we won't paint ourselves into a corner by not doing this up front. The patch is already complicated enough! Users can always have multiple UPSERT commands, and that might be very close to good enough for a relatively rare use case like this. >> I could easily have the unique index inference specification accept a >> named opclass, if you thought that was important, and you thought >> naming a non-default opclass by name was a good SQL interface. It >> would take only a little effort to support non-default opclasses. > > It's a little weird to mention an opclass by name. It's similar to naming an > index by name, really. How about naming the operator? For an exclusion > constraint, that would be natural, as the syntax to create an exclusion > constraint in the first place is "EXCLUDE USING gist (c WITH &&)" > > Naming the index by columns makes sense in most cases, and I don't like > specifying the index's name, but how about allowing naming a constraint? > Indexes are just an implementation detail, but constraints are not. Unique > and exclusion constraints are always backed by an index, so there is little > difference in practice, but I would feel much more comfortable mentioning > constraints by name than indexes. The main reason for naming a constraint by name in practice will probably be because there is no better way to deal with partial unique indexes (which can be quite useful). But partial unique indexes aren't formally constraints, in that they don't have pg_constraint entries. So I don't think that that's going to be acceptable, entirely for that reason. :-( > Most people would list the columns, but if there is a really bizarre > constraint, with non-default opclasses, or an exclusion constraint, it's > probably been given a name that you could use. What I find curious about the opclass thing is: when do you ever have an opclass that has a different idea of equality than the default opclass for the type? In other words, when is B-Tree strategy number 3 not actually '=' in practice, for *any* B-Tree opclass? Certainly, it doesn't appear to be the case that it isn't so with any shipped opclasses - the shipped non-default B-Tree opclasses only serve to provide alternative notions of sort order, and never "equals". I think that with B-Tree (which is particularly relevant for the UPDATE variant), it ought to be defined to work with the type's default opclass "equals" operator, just like GROUP BY and DISTINCT. Non-default opclass unique indexes work just as well in practice, unless someone somewhere happens to create an oddball one that doesn't use '=' as its "equals" operator (while also having '=' as the default opclass "equals" operator). I am not aware that that leaves any actually shipped opclass out (and I include our external extension ecosystem here, although I might be wrong about that part). > In theory, with the promise tuple approach to locking, you don't necessarily > even need an index to back up the constraint. > So probably better to not allow it. I agree that we definitely want to require that there is an appropriate index available. I think we can live without support for partial unique indexes for the time being. With non-default opclasses effectively handled (by caring about the "equals" operator only, and acceptable non-default opclass indexes when that happens to match the default's), and by assuming that having an INSERT ... ON CONFLICT IGNORE without an inference specification to find an exclusion constraint is enough, we have acceptable semantics, IMV. The worst part of that is that partial unique indexes cannot be used with the ON CONFLICT UPDATE variant, in my opinion, but Rome wasn't built in a day. It would be nice to have a way of discriminating against particular indexes (unique constraint-related, partial unique, or otherwise) for the IGNORE variant, but I fear that that'll be difficult to figure out in time. There is no need to address those questions in the first version, since I don't think we're failing to play nice with another major feature. We already have something much more flexible than equivalent features in other major systems here. -- Peter Geoghegan
On Fri, Dec 19, 2014 at 05:32:43PM -0800, Peter Geoghegan wrote: > > Most people would list the columns, but if there is a really bizarre > > constraint, with non-default opclasses, or an exclusion constraint, it's > > probably been given a name that you could use. > > What I find curious about the opclass thing is: when do you ever have > an opclass that has a different idea of equality than the default > opclass for the type? In other words, when is B-Tree strategy number 3 > not actually '=' in practice, for *any* B-Tree opclass? Certainly, it > doesn't appear to be the case that it isn't so with any shipped > opclasses - the shipped non-default B-Tree opclasses only serve to > provide alternative notions of sort order, and never "equals". Well, in theory you could build a case insensetive index on a text column. You could argue that the column should have been defined as citext in the first place, but it might not for various reasons. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
On Sat, Dec 20, 2014 at 2:16 AM, Martijn van Oosterhout <kleptog@svana.org> wrote: >> What I find curious about the opclass thing is: when do you ever have >> an opclass that has a different idea of equality than the default >> opclass for the type? In other words, when is B-Tree strategy number 3 >> not actually '=' in practice, for *any* B-Tree opclass? Certainly, it >> doesn't appear to be the case that it isn't so with any shipped >> opclasses - the shipped non-default B-Tree opclasses only serve to >> provide alternative notions of sort order, and never "equals". > > Well, in theory you could build a case insensetive index on a text > column. You could argue that the column should have been defined as > citext in the first place, but it might not for various reasons. That generally works in other systems by having a case-insensitive collation. I don't know if that implies that non bitwise identical items can be equal according to the "equals" operator in those other systems. There aren't too many examples of that happening in general (I can only think of citext and numeric offhand), presumably because it necessitates a normalization process (such as lower-casing in the case of citext) within the hash opclass support function 1, a process best avoided. citext is an interesting precedent that supports my argument above, because citext demonstrates that we preferred to create a new type rather than a new non-default opclass (with a non-'=' "equals" operator) when time came to introduce a new concept of "equals" (and not merely a new, alternative sort order). Again, this is surely due to the system dependency on the default B-Tree opclass for the purposes of GROUP BY and DISTINCT, whose behavior sort ordering doesn't necessarily enter into at all. -- Peter Geoghegan
On Thu, Dec 18, 2014 at 9:31 AM, Peter Geoghegan <pg@heroku.com> wrote: >> So I think there needs to be some kind of logic to de-recognize the table >> alias "foo". >> >> Once I rewrote the query to use TARGET and EXCLUDED correctly, I've put this >> through an adaptation of my usual torture test, and it ran fine until >> wraparound shutdown. I'll poke at it more later. > > Oops. I agree with your diagnosis, and will circle around to fix that > bug in the next revision Attached patch fixes the bug. I'm not delighted about the idea of cutting off parent parse state (the parse state of the insert) within transformUpdateStmt() only once we've used the parent state to establish that this is a "speculative"/auxiliary update, but it's probably the path of least resistance here. When this is rolled into the next version, there will be a testcase. Thanks -- Peter Geoghegan
Attachment
On Sun, Dec 21, 2014 at 6:56 AM, Peter Geoghegan <pg@heroku.com> wrote: > On Thu, Dec 18, 2014 at 9:31 AM, Peter Geoghegan <pg@heroku.com> wrote: >>> So I think there needs to be some kind of logic to de-recognize the table >>> alias "foo". >>> >>> Once I rewrote the query to use TARGET and EXCLUDED correctly, I've put this >>> through an adaptation of my usual torture test, and it ran fine until >>> wraparound shutdown. I'll poke at it more later. >> >> Oops. I agree with your diagnosis, and will circle around to fix that >> bug in the next revision > > Attached patch fixes the bug. I'm not delighted about the idea of > cutting off parent parse state (the parse state of the insert) within > transformUpdateStmt() only once we've used the parent state to > establish that this is a "speculative"/auxiliary update, but it's > probably the path of least resistance here. > > When this is rolled into the next version, there will be a testcase. Looking at this thread, the last version of this patch is available here: http://www.postgresql.org/message-id/CAM3SWZRvkCKc=1Y6_Wn8mk97_Vi8+j-aX-RY-=msrJVU-Ec-qw@mail.gmail.com And they do not apply correctly, so this patch needs a rebase. Regards, -- Michael
On Sun, Dec 21, 2014 at 6:10 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > Looking at this thread, the last version of this patch is available here: > http://www.postgresql.org/message-id/CAM3SWZRvkCKc=1Y6_Wn8mk97_Vi8+j-aX-RY-=msrJVU-Ec-qw@mail.gmail.com > And they do not apply correctly, so this patch needs a rebase. That isn't so. The latest version is much more recent than that. It's available here: http://www.postgresql.org/message-id/CAM3SWZQTqsCLZ1YJ1OuWFpO-GmFHwtgwTOg+o_NNzxrPa7Cx4A@mail.gmail.com Everything is tracked in the commitfest app in detail. -- Peter Geoghegan
On Mon, Dec 22, 2014 at 11:20 AM, Peter Geoghegan <pg@heroku.com> wrote: > On Sun, Dec 21, 2014 at 6:10 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> Looking at this thread, the last version of this patch is available here: >> http://www.postgresql.org/message-id/CAM3SWZRvkCKc=1Y6_Wn8mk97_Vi8+j-aX-RY-=msrJVU-Ec-qw@mail.gmail.com >> And they do not apply correctly, so this patch needs a rebase. > > That isn't so. The latest version is much more recent than that. It's > available here: > > http://www.postgresql.org/message-id/CAM3SWZQTqsCLZ1YJ1OuWFpO-GmFHwtgwTOg+o_NNzxrPa7Cx4A@mail.gmail.com > > Everything is tracked in the commitfest app in detail. Oops, sorry. I got mistaken because of the name of the latest attachments. -- Michael
On 12/20/2014 11:14 PM, Peter Geoghegan wrote: > On Sat, Dec 20, 2014 at 2:16 AM, Martijn van Oosterhout > <kleptog@svana.org> wrote: >>> What I find curious about the opclass thing is: when do you ever have >>> an opclass that has a different idea of equality than the default >>> opclass for the type? In other words, when is B-Tree strategy number 3 >>> not actually '=' in practice, for *any* B-Tree opclass? Certainly, it >>> doesn't appear to be the case that it isn't so with any shipped >>> opclasses - the shipped non-default B-Tree opclasses only serve to >>> provide alternative notions of sort order, and never "equals". >> >> Well, in theory you could build a case insensetive index on a text >> column. You could argue that the column should have been defined as >> citext in the first place, but it might not for various reasons. > > That generally works in other systems by having a case-insensitive > collation. I don't know if that implies that non bitwise identical > items can be equal according to the "equals" operator in those other > systems. There aren't too many examples of that happening in general > (I can only think of citext and numeric offhand), presumably because > it necessitates a normalization process (such as lower-casing in the > case of citext) within the hash opclass support function 1, a process > best avoided. > > citext is an interesting precedent that supports my argument above, > because citext demonstrates that we preferred to create a new type > rather than a new non-default opclass (with a non-'=' "equals" > operator) when time came to introduce a new concept of "equals" (and > not merely a new, alternative sort order). Again, this is surely due > to the system dependency on the default B-Tree opclass for the > purposes of GROUP BY and DISTINCT, whose behavior sort ordering > doesn't necessarily enter into at all. Yeah, I don't expect it to happen very often. It's confusing to have multiple definitions of equality. There is one built-in example: the "record *= record" operator [1]. It's quite special purpose, the docs even say that they "are not intended to be generally useful for writing queries". But there they are. I feel that it needs to be possible to specify the constraint unambiguously in all cases. These are very rare use cases, but we should have an escape hatch for the rare cases that need it. What would it take to also support partial indexes? [1] See http://www.postgresql.org/docs/devel/static/functions-comparisons.html#ROW-WISE-COMPARISON - Heikki
On Mon, Dec 22, 2014 at 1:24 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > I feel that it needs to be possible to specify the constraint unambiguously > in all cases. These are very rare use cases, but we should have an escape > hatch for the rare cases that need it. > > > What would it take to also support partial indexes? Aside from considerations about how to pick them without using their name, partial unique indexes aren't special at all. My earlier concern was that we'd need to account for before row insert triggers that change values out from under us. But maybe that concern was overblown, come to think of it. I am already borrowing a little bit of the raw parser's logic for CREATE INDEX statements for unique index inference (during parse analysis) -- we're matching the cataloged index definition attributes/expressions, so this makes a lot of sense. Maybe I had the wrong idea about partial indexes earlier, which was that we must use the values in the tuple proposed for insertion to check that a partial index was a suitable arbiter of whether or not the UPDATE path should be taken in respect of any given tuple. I should just go further with borrowing things from CREATE INDEX, and give the user an optional way of specifying a WHERE clause that is also matched in a similar way to the expressions themselves. Did the partial unique index your UPSERT implied not cover the ultimate tuple inserted after before row insert triggers fired? That's on you as a user...you'll always get an insert, since there won't be a would-be duplicate violation to make there be an update. I actually care about partial unique indexes a lot. They're a very useful feature. Back when I was an application developer, I frequently used "is_active" boolean columns to represent "logical app-level deletion", where actually deleting the tuple was not possible (e.g. because it may still be referenced in historic records), while not wanting to have it be subject to uniqueness checks as a logically deleted/!is_active tuple. This measure to support partial indexes, plus the additional leeway around non-default opclass unique indexes that I can add (that they need only match the "equals" operator of the default opclass to be accepted) brings us 99.9% of the way. That only leaves: * An inability to specifying some subset of unique indexes or exclusion constraints for the IGNORE variant (the UPDATE variant is irrelevant). * An inability to specifying a IGNORE arbitrating *exclusion constraint* as the sole arbiter of whether or not the IGNORE path should be taken. (exclusion constraints are not usable for the UPDATE variant, so that's irrelevant again). Did I forget something? The use cases around these limitations are very rare, and only apply to the IGNORE variant which seems much less interesting. I'm quite comfortable dealing with them in a later release of PostgreSQL, to cut scope (or avoid adding scope) for 9.5. Do you think that's okay? How often will the IGNORE variant be used when everything shouldn't be IGNOREd anyway? Although, to be totally fair, I should probably also include: * non-default B-tree opclasses cannot be specified as arbiters of the alternative path (for both IGNORE and UPDATE variants) iff their "equals" operator happens to not be the "equals" operator of the default opclass (which is theoretical, and likely non-existent as a use case). If you're dead set on having an escape hatch, maybe we should just get over it and add a way of specifying a unique index by name. As I said, these under-served use cases are either exceedingly rare or entirely theoretical. -- Peter Geoghegan
On Mon, Dec 22, 2014 at 5:04 PM, Peter Geoghegan <pg@heroku.com> wrote: > If you're dead set on having an escape hatch, maybe we should just get > over it and add a way of specifying a unique index by name. As I said, > these under-served use cases are either exceedingly rare or entirely > theoretical. I'm decidedly unenthusiastic about that. People don't expect CREATE INDEX CONCURRENTLY + DROP INDEX CONCURRENTLY to break their DML. I think the solution in this case would be a gateway to problems larger than the one we're trying to solve. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 23, 2014 at 5:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Dec 22, 2014 at 5:04 PM, Peter Geoghegan <pg@heroku.com> wrote: >> If you're dead set on having an escape hatch, maybe we should just get >> over it and add a way of specifying a unique index by name. As I said, >> these under-served use cases are either exceedingly rare or entirely >> theoretical. > > I'm decidedly unenthusiastic about that. People don't expect CREATE > INDEX CONCURRENTLY + DROP INDEX CONCURRENTLY to break their DML. I > think the solution in this case would be a gateway to problems larger > than the one we're trying to solve. I tend to agree. I think we should just live with the fact that not every conceivable use case will be covered, at least initially. Then, if an appreciable demand for even more flexibility emerges, we can revisit this. We already have a syntax that is significantly more flexible than the equivalent feature in any other system. Let's not lose sight of that. -- Peter Geoghegan
On Tue, Dec 23, 2014 at 11:30 AM, Peter Geoghegan <pg@heroku.com> wrote: > I tend to agree. I think we should just live with the fact that not > every conceivable use case will be covered, at least initially. To be clear: I still think I should go and make the changes that will make the feature play nice with all shipped non-default B-Tree operator classes, and will make it work with partial unique indexes [1]. That isn't difficult or controversial, AFAICT, and gets us very close to satisfying every conceivable use case. [1] http://www.postgresql.org/message-id/CAM3SWZQdv7GDLwPRv7=rE-gG1QjLOOL3vCmAriCBcTYk8GwqKw@mail.gmail.com -- Peter Geoghegan
* Peter Geoghegan (pg@heroku.com) wrote: > On Tue, Dec 23, 2014 at 5:46 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Dec 22, 2014 at 5:04 PM, Peter Geoghegan <pg@heroku.com> wrote: > >> If you're dead set on having an escape hatch, maybe we should just get > >> over it and add a way of specifying a unique index by name. As I said, > >> these under-served use cases are either exceedingly rare or entirely > >> theoretical. > > > > I'm decidedly unenthusiastic about that. People don't expect CREATE > > INDEX CONCURRENTLY + DROP INDEX CONCURRENTLY to break their DML. I > > think the solution in this case would be a gateway to problems larger > > than the one we're trying to solve. > > I tend to agree. I think we should just live with the fact that not > every conceivable use case will be covered, at least initially. Then, > if an appreciable demand for even more flexibility emerges, we can > revisit this. We already have a syntax that is significantly more > flexible than the equivalent feature in any other system. Let's not > lose sight of that. +1 Thanks, Stephen
On Thu, Dec 18, 2014 at 9:20 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > I've put this through an adaptation of my usual torture test, and it ran > fine until wraparound shutdown. I'll poke at it more later. Could you elaborate, please? What are the details of the torture test you're performing? -- Peter Geoghegan
On Fri, Dec 19, 2014 at 5:32 PM, Peter Geoghegan <pg@heroku.com> wrote: >> Most people would list the columns, but if there is a really bizarre >> constraint, with non-default opclasses, or an exclusion constraint, it's >> probably been given a name that you could use. > > What I find curious about the opclass thing is: when do you ever have > an opclass that has a different idea of equality than the default > opclass for the type? In other words, when is B-Tree strategy number 3 > not actually '=' in practice, for *any* B-Tree opclass? Certainly, it > doesn't appear to be the case that it isn't so with any shipped > opclasses - the shipped non-default B-Tree opclasses only serve to > provide alternative notions of sort order, and never "equals". > > I think that with B-Tree (which is particularly relevant for the > UPDATE variant), it ought to be defined to work with the type's > default opclass "equals" operator, just like GROUP BY and DISTINCT. > Non-default opclass unique indexes work just as well in practice, > unless someone somewhere happens to create an oddball one that doesn't > use '=' as its "equals" operator (while also having '=' as the default > opclass "equals" operator). I am not aware that that leaves any > actually shipped opclass out (and I include our external extension > ecosystem here, although I might be wrong about that part). So looking at the way the system deals with its dependence on default operator classes, I have a hard time justifying all this extra overhead for the common case. The optimizer will refuse to use an index with a non-default opclass even when AFAICT there is no *real* semantic dependence on anything other than the "equals" operator, which seems to always match across a type's opclasses anyway. e.g., DISTINCT will only use a non-default opclass B-Tree index, even though in practice the "equals" operator always matches for shipped non-default opclasses; DISTINCT will not work with a text_pattern_ops index, while it will work with a default text B-Tree opclass index, *even though no corresponding "ORDER BY" was given*. Someone recently pointed out in a dedicated thread that the system isn't all that bright about exploiting the fact that group aggregates don't necessarily need to care about facets of sort ordering like collations, which have additional overhead [1]. That might be a useful special case to target (to make underlying sorts faster), but the big picture is that the system doesn't know when it only needs to care about an "equals" operator matching some particular B-Tree-opclass-defined notion of sorting, rather than caring about a variety of operators matching. Sometimes, having a matching "equals" operator of some non-default opclass is good enough to make an index (or sort scheme) of that opclass usable for some purpose that only involves equality, and not sort order (like DISTINCT, with no ORDER BY, executed using a GroupAggregate, for example). I thought we should formalize the idea that a non-default opclass must have the same notion of equality (the same "equals" operator) as its corresponding default opclass, if any. That way, presumably the optimizer has license to be clever about only caring about "DISTINCTness"/equality. That also gives my implementation license to not care about which operator class a unique index uses -- it must not matter. Heikki pointed out that there is one shipped opclass that has an "equals" operator that happens to not be spelt "=" [2] (and furthermore, does not match that of the default opclass). That's the record_image_ops opclass, which unusually has an "equals" operator of "*=". So as Heikki pointed out, it looks like there is some limited precedent for having to worry about B-Tree opclasses that introduce alternative notions of "equals", rather than merely alternative notions of sort order. So so much for formalizing that all of a type's B-Tree opclass "equals" operators must match... ...having thought about it for a while more, though, I think we should *still* ignore opclass for the purposes of unique index inference. The implementation doesn't care about the fact that you used a non-default opclass. Sure, in theory that could lead to inconsistencies, if there was multiple unique indexes of multiple opclasses that just so happened to have incompatible ideas about equality, but that seems ludicrous...we have only one extremely narrow example of how that could happen. Plus there'd have to be *both* unique indexes defined and available for us to infer as appropriate, before the inference logic could accidentally infer the wrong idea of equality. That seems like an extremely implausible scenario. Even if we allow for the idea that alternative notions of equality are something that will happen in the wild, obviously the user cares about the definition of equality that they actually used for the unique index in question. We can document that unique index inference doesn't care about opclasses (recall that I still only plan on letting users infer a B-Tree unique index), which is thought to almost certainly not matter. I think that ought to be fine. In the next revision of UPSERT, the implementation formally won't care about the opclass of an index when inferring a unique index to use as an arbiter of whether to take the alternative IGNORE/UPDATE path. That's formally left undefined. As already discussed before, I will still proceed with allowing the user to pick a partial unique index when writing a unique index inference specification. [1] http://www.postgresql.org/message-id/CAFjtmHU3Obf5aSpWY7i18diapvjg-418hYySdqUuYhXZtjChhg@mail.gmail.com [2] http://www.postgresql.org/message-id/54988BF5.9000405@vmware.com -- Peter Geoghegan
On Tue, Dec 23, 2014 at 11:55 AM, Peter Geoghegan <pg@heroku.com> wrote:
On Thu, Dec 18, 2014 at 9:20 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> I've put this through an adaptation of my usual torture test, and it ran
> fine until wraparound shutdown. I'll poke at it more later.
Could you elaborate, please? What are the details of the torture test
you're performing?
I've uploaded it here.
The gist of it is that I increment a count column of a random row (via pk) in multiple connections simultaneously.
When the server crashes, or it gets to a certain number of increments, the threads report their activity up to the parent, which then waits for automatic recovery and compares the state of the database to the reported state of the children threads.
That is for my original code. For this purpose, I made the count go either up or down randomly, and when a row's count passes through zero it gets deleted. Then when it is chosen for increment/decrement again, it has to be inserted. I've made this happen either through a update-or-insert-or-retry loop (two variants) or by using your new syntax.
There is a patch which adds a simulation for a torn-page-write followed by a crash, and also adds some elogs that I've sometimes found useful for tracking down problems, with new GUCs to control them.
I don't think you made changes to the WAL/recovery routines, so I don't expect crashing recovery to be a big hazard for your patch, but I wanted to run a test where I was generally familiar with the framework, and thought an independently derived test might exercise some new aspects.
The one thing I noticed is that using your syntax starts out slightly slower than the retry loop, but then gets much slower (down by 2 or 3 times) after a while. It might be a vacuuming issue. The constant intentional crashes interferes with good vacuuming behavior, and I need to retest this with the intentional crashes turned off to see if that fixes it. I'm having difficult access to my usual testing hardware over the holidays, so I'm not getting as much done as I hoped.
I'll try to look at your own stress tests on github as well.
Cheers,
Jeff
On Sat, Dec 27, 2014 at 11:48 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >> Could you elaborate, please? What are the details of the torture test >> you're performing? > The gist of it is that I increment a count column of a random row (via pk) > in multiple connections simultaneously. This is great. In general, I strongly believe that we should be doing this kind of thing more formally and more frequently. Thanks! > That is for my original code. For this purpose, I made the count go either > up or down randomly, and when a row's count passes through zero it gets > deleted. Then when it is chosen for increment/decrement again, it has to be > inserted. I've made this happen either through a update-or-insert-or-retry > loop (two variants) or by using your new syntax. Did you continue to limit your investigation to value locking approach #1? I think that #2 is the more likely candidate for commit, that we should focus on. However, #1 is more "conceptually pure", and is therefore an interesting basis of comparison with #2 when doing this kind of testing. > There is a patch which adds a simulation for a torn-page-write followed by a > crash, and also adds some elogs that I've sometimes found useful for > tracking down problems, with new GUCs to control them. Cool. > I don't think you made changes to the WAL/recovery routines, so I don't > expect crashing recovery to be a big hazard for your patch, but I wanted to > run a test where I was generally familiar with the framework, and thought an > independently derived test might exercise some new aspects. Value locking approach #2 does touch crash recovery. Value locking approach #1 does not. I certainly see the logic in starting with independently derived tests. We all have our blind spots. > The one thing I noticed is that using your syntax starts out slightly slower > than the retry loop, but then gets much slower (down by 2 or 3 times) after > a while. It might be a vacuuming issue. Interesting. I'd like to compare both approaches to value locking here. > I'll try to look at your own stress tests on github as well. Would you be opposed to merging your custom stress-test suite into my git repo? I'll give you the ability to push to it. I can help you out if you think you'd benefit from access to my Quad-core server (Intel Core i7-4770) for stress-testing. I'll coordinate with you about it privately. -- Peter Geoghegan
On Fri, Dec 26, 2014 at 4:22 PM, Peter Geoghegan <pg@heroku.com> wrote: > So looking at the way the system deals with its dependence on default > operator classes, I have a hard time justifying all this extra > overhead for the common case. Attached pair of revised patch sets, V1.8: * Explicitly leaves undefined what happens when a non-default opclass index *with an alternative notion of not just sort order, but equality* exists. In practice it depends on the available unique indexes. I really found it impossible to justify imposing any restriction here, given the total lack of a scenario in which this even *could* matter, let alone will. This is a minor wart, but I think it's acceptable. * Allows "unique index inference specification" clause to have a WHERE clause (this is distinct from the WHERE clause that might also appear in the UPDATE auxiliary query). This can be used to infer partial unique indexes. I really didn't want to give up support for partial indexes with the UPDATE variant (recall that the UPDATE variant *requires* an inference clause), since partial unique indexes are particularly useful. Note that the unique index must actually cover the tuple at insert time, or an error is raised. An example of this that appears in the regression tests is: insert into insertconflicttest values (23, 'Uncovered by Index') on conflict (key where fruit like '%berry') ignore; ERROR: partial arbiter unique index has predicate that does not cover tuple proposed for insertion DETAIL: ON CONFLICT inference clause implies that the tuple proposed for insertion actually be covered by partial predicate for index "partial_key_index". HINT: ON CONFLICT inference clause must infer a unique index that covers the final tuple, after BEFORE ROW INSERT triggers fire. * New documentation reflecting the above. A couple of paragraphs in the INSERT SQL reference page now covers these topics. * Fix Jeff Jane's bug by added sanitizing code [1]. Certain illegal queries now correctly rejected during parse analysis. * Fixed another tiny buglet in EXPLAIN ANALYZE output with a RETURNING clause, by making sure auxiliary query plan from update also has plan-level targetlist set. * Minor clean-up to code comments here and there (in particular, for the ExcludedExpr primnode used to implement the EXCLUDED.* pseudo-alias thing). * Better serialization failure error messages. I recommend looking at my mirror of the modified documentation: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html to get up to speed on how unique index inference specification clause have been extended to support partial unique indexes. As I mentioned, apart from that, the INSERT SQL reference page now covers the definition of a "CONFLICT"/opclass semantics issues. I really hope that this deals with all semantics/syntax related loose ends, allowing discussion of this patch to take a more low-level focus, which is what is really needed. I feel that further improvements may be possible, and that the syntax can be even more flexible, but it's already flexible enough for our first iteration of this feature. Importantly, we have something that is enormously more flexible than any equivalent feature in any other system, which includes the flexibility to extend the syntax in various other directions (e.g. specifying particular exclusion constraints). [1] http://archives.postgresql.org/message-id/CAM3SWZT=HptrGyihZiyT39sPBhp+CXOTW=MhNFzXiLf-Jh4QVA@mail.gmail.com -- Peter Geoghegan
Attachment
On Sun, Dec 28, 2014 at 3:19 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Fri, Dec 26, 2014 at 4:22 PM, Peter Geoghegan <pg@heroku.com> wrote:
> So looking at the way the system deals with its dependence on default
> operator classes, I have a hard time justifying all this extra
> overhead for the common case.
Attached pair of revised patch sets, V1.8:
Hi Peter,
Using the vallock2 version of V1.8, using the test I previously described, I get some all-null rows, which my code should never create. Also, the index and table don't agree, in this example I find 3 all-null rows in the table, but only 2 in the index. I've attached an example output of querying via index and via full table scan, and also the pageinspect output of the blocks which have the 3 rows, in case that is helpful.
This was just a straight forward issue of firing queries at the database, the crash-inducing part of my test harness was not active during this test. I also ran it with my crashing patch reversed out, in case I introduced the problem myself, and it still occurs.
Using V1.7 of the vallock2 patch, I saw the same thing with some all-null rows. I also saw some other issues where two rows with the same key value would be present twice in the table (violating the unique constraint) but only one of them would appear in the index. I suspect it is caused by the same issue as the all-null rows, and maybe I just didn't run v1.8 enough times to find that particular manifestation under v1.8.
Cheers,
Jeff
Attachment
Hi Jeff, On Mon, Dec 29, 2014 at 2:29 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > Using the vallock2 version of V1.8, using the test I previously described, I > get some all-null rows, which my code should never create. Also, the index > and table don't agree, in this example I find 3 all-null rows in the table, > but only 2 in the index. I've attached an example output of querying via > index and via full table scan, and also the pageinspect output of the blocks > which have the 3 rows, in case that is helpful. Interesting. Thanks a lot for your help! > This was just a straight forward issue of firing queries at the database, > the crash-inducing part of my test harness was not active during this test. > I also ran it with my crashing patch reversed out, in case I introduced the > problem myself, and it still occurs. > > Using V1.7 of the vallock2 patch, I saw the same thing with some all-null > rows. I also saw some other issues where two rows with the same key value > would be present twice in the table (violating the unique constraint) but > only one of them would appear in the index. I suspect it is caused by the > same issue as the all-null rows, and maybe I just didn't run v1.8 enough > times to find that particular manifestation under v1.8. This is almost certainly a latent bug with approach #2 to value locking, that has probably been there all along. Semantics and syntax have been a recent focus, and so the probability that I introduced a regression of this nature in any recent revision seems low. I am going to investigate the problem, and hope to have a diagnosis soon. Once again, thanks! -- Peter Geoghegan
On Mon, Dec 29, 2014 at 2:29 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > I've attached an example output of querying via index and via full table > scan, and also the pageinspect output of the blocks which have the 3 rows, > in case that is helpful. You might have also included output from pageinspect's bt_page_items() function. Take a look at the documentation patch I just posted if the details are unclear. Thanks -- Peter Geoghegan
On Mon, Dec 29, 2014 at 2:29 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > Using the vallock2 version of V1.8, using the test I previously described, I > get some all-null rows, which my code should never create. Also, the index > and table don't agree, in this example I find 3 all-null rows in the table, > but only 2 in the index. Just to be clear: You haven't found any such issue with approach #1 to value locking, right? I'm curious about how long it took you to see the issue with #2. Were there any special steps? What were the exact steps involved in turning off the hard crash mechanism you mention? It looks like the condition you describe ought to be highlighted by the script automatically. Is that right? (I don't know any Perl and the script isn't really documented at a high level). Thanks -- Peter Geoghegan
On Mon, Dec 29, 2014 at 9:12 PM, Peter Geoghegan <pg@heroku.com> wrote:
On Mon, Dec 29, 2014 at 2:29 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Using the vallock2 version of V1.8, using the test I previously described, I
> get some all-null rows, which my code should never create. Also, the index
> and table don't agree, in this example I find 3 all-null rows in the table,
> but only 2 in the index.
Just to be clear: You haven't found any such issue with approach #1 to
value locking, right?
Correct, I haven't seen any problems with approach #1
I'm curious about how long it took you to see the issue with #2. Were
there any special steps? What were the exact steps involved in turning
off the hard crash mechanism you mention?
Generally the problem will occur early on in the process, and if not then it will not occur at all. I think that is because the table starts out empty, and so a lot of insertions collide with each other. Once the table is more thoroughly populated, most query takes the CONFLICT branch and therefore two insertion-branches are unlikely to collide.
At its simplest, I just use the count_upsert.pl script and your patch and forget all the rest of the stuff from my test platform.
So:
pg_ctl stop -D /tmp/data2; rm /tmp/data2 -r;
../torn_bisect/bin/pg_ctl initdb -D /tmp/data2;
../torn_bisect/bin/pg_ctl start -D /tmp/data2 -o "--fsync=off" -w ;
createdb;
perl count_upsert.pl 8 100000
A run of count_upsert.pl 8 100000 takes about 30 seconds on my machine (8 core), and if it doesn't create a problem then I just destroy the database and start over.
The fsync=off is not important, I've seen the problem once without it. I just include it because otherwise the run takes a lot longer.
I've attached another version of the count_upsert.pl script, with some more logging targeted to this particular issue.
The problem shows up like this:
init done at count_upsert.pl line 97.
sum is 1036
count is 9720
seq scan doesn't match index scan 1535 == 1535 and 1 == 6 $VAR1 = [
[
6535,
-21
],
.....
(Thousands of more lines, as it outputs the entire table twice, once gathered by seq scan, once by bitmap index scan).
The first three lines are normal, the problem starts with the "seq scan doesn't match"...
In this case the first problem it ran into was that key 1535 was present once with a count column of 1 (found by seq scan) and once with a count column of 6 (found by index scan). It was also in the seq scan with a count of 6, but the way the comparison works is that it sorts each representation of the table by the key column value and then stops at the first difference, in this case count columns 1 == 6 failed the assertion.
If you get some all-NULL rows, then you will also get Perl warnings issued when the RETURNING clause starts returning NULL when none are expected to be.
The overall pattern seems to be pretty streaky. It could go 20 iterations with no problem, and then it will fail several times in a row. I've seen this pattern quite a bit with other race conditions as well, I think that they may be sensitive to how memory gets laid out between CPUs, and that might depend on some longer-term characteristic of the state of the machine that survives an initdb.
By the way, I also got a new error message a few times that I think might be a manifestation of the same thing:
ERROR: duplicate key value violates unique constraint "foo_index_idx"
DETAIL: Key (index)=(6106) already exists.
STATEMENT: insert into foo (index, count) values ($2,$1) on conflict (index)
update set count=TARGET.count + EXCLUDED.count returning foo.count
Cheers,
Jeff
Attachment
On Mon, Dec 29, 2014 at 11:52 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > Correct, I haven't seen any problems with approach #1 That helps with debugging #2, then. That's very helpful. > Generally the problem will occur early on in the process, and if not then it > will not occur at all. I think that is because the table starts out empty, > and so a lot of insertions collide with each other. Once the table is more > thoroughly populated, most query takes the CONFLICT branch and therefore two > insertion-branches are unlikely to collide. > > At its simplest, I just use the count_upsert.pl script and your patch and > forget all the rest of the stuff from my test platform. I can reproduce this on my laptop now. I think that building at -O2 and without assertions helps. I'm starting to work through debugging it. I threw together a quick script for getting pg_xlogdump into a Postgres table (a nice use of the new pg_lsn type). It's here: https://github.com/petergeoghegan/jjanes_upsert/blob/master/pg_xlogdump2csv.py It tells a story. Looking at the last segment before shutdown when the problem occurred, I see: postgres=# select count(*), tx from my_xlogdump group by tx having count(*) > 4 order by 1;count | tx -------+--------- 5 | 1917836 5 | 1902576 5 | 1909746 5 | 1901586 5 | 1916971 6 | 1870077 39 | 1918004 119 | 1918003 2246 | 0 (9 rows) postgres=# select max(tx::text::int4) from my_xlogdump ; max ---------1918004 (1 row) So the last two transactions (1918003 and 1918004) get into some kind of live-lock situation, it looks like. Or at least something that causes them to produce significant more WAL records than other xacts due to some kind of persistent problem with conflicts. Here is where the earlier of the two problematic transactions has its first record: postgres=# select * from my_xlogdump where tx = '1918003' order by r_lsn asc limit 1;rmgr | len_rec | len_tot | tx | r_lsn | prev_lsn | descr ------+---------+---------+---------+------------+------------+----------------------------------------------------Heap | 3 | 203 | 1918003 | 0/1783BB70 | 0/1783BB48 | INSERT off 33 blkref #0: rel 1663/16471/12502 blk (1 row) After and including that record, until the trouble spot up to and including shutdown, here is the rmgr breakdown: postgres=# select count(*), rmgr from my_xlogdump where r_lsn >= '0/1783BB70' group by rmgr order by 1;count | rmgr -------+------------- 1 | XLOG -- 1 CHECKPOINT_SHUTDOWN record 2 | Transaction -- commit records for the twoxacts 20 | Heap2 -- all are CLEAN remxid records, tx is 0 76 | Heap -- All from our two xacts... 80 | Btree -- All from XID 1918003 only (5 rows) So looks like a bad interaction with VACUUM. Maybe it's a problem with VACUUM interlocking. That was my first suspicion, FWIW. I'll need to do more investigating, but I can provide a custom format dump of the table, in case anyone wants to look at what I have here in detail. I've uploaded it to: http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/files/my_xlogdump.custom.dump.table -- Peter Geoghegan