Thread: INSERT ... ON CONFLICT IGNORE (and UPDATE) 3.0
Attached patch series forms what I'm calling V3.0 of the INSERT ... ON CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel this development justifies a new thread, though.) This features a new way of organizing the code. Heikki and I are now in agreement that the best way of incrementally committing the work is to do ON CONFLICT IGNORE first (perhaps someone else can review ON CONFLICT UPDATE). This patch series is organized so that the first commit adds tests, documentation and code that only relates to the IGNORE variant. RLS support is still split out as a separate commit, which is not intended to actually be committed separately (from the main ON CONFLICT UPDATE commit). As before, I've just done that to highlight that part for the benefit of RLS subject matter experts. The RTE permissions split patch is also broken out, but I believe that there is consensus that that could sensibly be committed on its own. There are some minor changes to the code/documentation itself: * Ran the code through pg_indent. * Documented ON CONFLICT UPDATE as a MERGE alternative under "unsupported SQL features" (sql_features.txt). * Minor tweaks of comments here and there. * Logical decoding fixes. * Call ExecCheckPlanOutput() for an ON CONFLICT UPDATE auxiliary query, too. I should point out that I'm aware of the following open issues around the patch series (most of these apply to the base IGNORE commit, so Heikki should of course look out for them): * Andres and I are still hashing out details of whether or not certain assumptions made around handling super deletion within logical decoding are safe [1]. Consider the decoding stuff suspect for now. * There is still an unresolved semantics issue with unique index inference and non-default opclasses. I think it's sufficient that the available/defined unique indexes dictate our idea of a unique violation (that necessitates taking the alternative path). Even in a world where there exists a non-default opclass with an "equals" operator that does not match that of the default opclass (that is not really the world we live in, because the only counter-example known is made that way specifically to *discourage* its use by users), this seems okay to me. It seems okay to me because surely the relevant definition of equality is the one actually chosen for the available unique index. If there exists an ambiguity for some strange reason (i.e. there are two unique indexes, on the same attribute(s), but with different "equals" operators), then its a costing issue, so the behavior given is essentially non-predictable (it could end up being either...but hey, those are the semantics). I have a very hard time imagining how that could ever be the case, even when we have (say) case insensitive opclasses for the text type. A case insensitive opclass is stricter than a case sensitive opclass. Why would a user ever want both on the same attribute(s) of the same table? Is the user really more or less expecting to never get a unique violation on the non-arbitrating unique index, despite all this? If reviewers are absolutely insistent that this theoretical ambiguity is a problem, we can add an optional CREATE INDEX style opclass specification (I'm already using the IndexElems representation from CREATE INDEX for the inference specification, actually, so that would be easy enough). I really have a hard time believing that the ugliness is a problem for those hypothetical users that eventually consider "equals" operator ambiguity among opclasses among available unique indexes to be a problem. I haven't just gone and implemented this already because I didn't want to document that an opclass specification will be accepted. Since there is literally no reason why anyone would care today, I see no reason to add what IMV would really just be cruft. * I'm flying blind with the SEPostgres changes. Unfortunately, it's not very possible to test SEPostgres on my machine. (Does not apply to IGNORE variant.) As always, the jjanes_upsert tool [2] has proven invaluable with testing this patch (Thanks Jeff!). Reviewers should look to that tool for ideas on how the patch might be broken. It caught a number of race conditions with exclusion constraints in the past, for example. We're in good shape with those now (and have been since V2.3 - see "livelock insurance" comments within the IGNORE commit). Thoughts? [1] http://www.postgresql.org/message-id/20150303111342.GF2579@alap3.anarazel.de [2] https://github.com/petergeoghegan/jjanes_upsert -- Peter Geoghegan
Attachment
On Wed, Mar 4, 2015 at 5:18 PM, Peter Geoghegan <pg@heroku.com> wrote: > Attached patch series forms what I'm calling V3.0 of the INSERT ... ON > CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel > this development justifies a new thread, though.) Bruce Momjian kindly made available a server for stress-testing [1]. I'm using jjanes_upsert for this task (I stopped doing this for a little while, and was in need of a new server). At the very highest client count I'm testing (128), I see unprincipled deadlocks for the exclusion constraint case very infrequently: """"" 2015-03-05 14:09:36 EST [ 64987901 ]: ERROR: deadlock detected 2015-03-05 14:09:36 EST [ 64987901 ]: DETAIL: Process 7044 waits for ShareLock on promise tuple with token 1 of transaction 64987589; blocked by process 7200. Process 7200 waits for ShareLock on transaction 64987901; blocked by process 7044. Process 7044: insert into upsert_race_test (index, count) values ('541','-1') on conflict update set count=TARGET.count + EXCLUDED.count whereTARGET.index = EXCLUDED.index returning count Process 7200: insert into upsert_race_test (index,count) values ('541','-1') on conflict update set count=TARGET.count + EXCLUDED.count whereTARGET.index = EXCLUDED.index returning count 2015-03-05 14:09:36 EST [ 64987901 ]: HINT: See server log for query details. 2015-03-05 14:09:36 EST [ 64987901 ]: STATEMENT: insert into upsert_race_test (index, count) values ('541','-1') on conflict update set count=TARGET.count + EXCLUDED.count where TARGET.index = EXCLUDED.index returning count """"" (Reminder: Exclusion constraints doing UPSERTs, and not just IGNOREs, are artificial; this has been enabled within the parser solely for the benefit of this stress-testing. Also, the B-Tree AM does not have nor require "livelock insurance"). This only happens after just over 30 minutes, while consistently doing 128 client exclusion constraint runs. This is pretty close to the most stressful thing that you could throw at the implementation, so that's really not too bad. I believe that this regression occurred as a direct result of adding "livelock insurance". Basically, we cannot be 100% sure that the interleaving of WAL-logged things within and across transactions won't be such that the "only", "oldest" session that gets to wait in the second stage (the second possible call to check_exclusion_or_unique_constraint(), from ExecInsertIndexTuples()) will really be the "oldest" XID. Another *older* xact could just get in ahead of us, waiting on our promise tuple as we wait on its xact end (maybe it updates some third tuple that we didn't see in that has already committed...not 100% sure). Obviously XID assignment order does not guarantee that things like heap insertion and index tuple insertion occur in serial XID order, especially with confounding factors like super deletion. And so, every once in a long while we deadlock. Now, the very fact that this happens at all actually demonstrates the need for "livelock insurance", IMV. The fact that we reliably terminate is *reassuring*, because livelocks are in general a terrifying possibility. We cannot *truly* solve the unprincipled deadlock problem without adding livelocks, I think. But what we have here is very close to eliminating unprincipled deadlocks, while not also adding any livelocks, AFAICT. I'd argue that that's good enough. Of course, when I remove "livelock insurance", the problem ostensibly goes away (I've waited on such a stress test session for a couple of hours now, so this conclusion seems very likely to be correct). I think that we should do nothing about this, other than document it as possible in our comments on "livelock insurance". Again, it's very unlikely to be a problem in the real world, especially since B-Trees are unaffected. Also, I should point out that one of those waiters doesn't look like an insertion-related wait at all: "7200 waits for ShareLock on transaction 64987901; blocked by process 7044". Looks like row locking is an essential part of this deadlock, and ordinarily that isn't even possible for exclusion constraints (unless the patch is hacked to make the parser accept exclusion constraints for an UPSERT, as it has been here). Not quite sure what exact XactLockTableWait() call did this, but don't think it was any of them within check_exclusion_or_unique_constraint(), based on the lack of details (TID and so on are not shown). [1] http://momjian.us/main/blogs/pgblog/2012.html#January_20_2012 -- Peter Geoghegan
On 5 March 2015 at 23:44, Peter Geoghegan <pg@heroku.com> wrote: > On Wed, Mar 4, 2015 at 5:18 PM, Peter Geoghegan <pg@heroku.com> wrote: >> Attached patch series forms what I'm calling V3.0 of the INSERT ... ON >> CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel >> this development justifies a new thread, though.) > Hi, I had a play with this, mainly looking at the interaction with RLS. (Note there is some bitrot in gram.y that prevents the first patch from applying cleanly to HEAD) I tested using the attached script, and one test didn't behave as I expected. I believe the following should have been a valid upsert (following the update path) but actually it failed: INSERT INTO t1 VALUES (4, 0) ON CONFLICT (a) UPDATE SET b = 1; AFAICT, it is applying a WITH CHECK OPTION with qual "b > 0 AND a % 2 = 0" to the about-to-be-updated tuple (a=4, b=0), which is wrong because the "b > 0" check (policy p3) should only be applied to the post-update tuple. Possibly I'm missing something though. Regards, Dean
Attachment
On 03/05/2015 03:18 AM, Peter Geoghegan wrote: > Attached patch series forms what I'm calling V3.0 of the INSERT ... ON > CONFLICT IGNORE/UPDATE feature. (Sorry about all the threads. I feel > this development justifies a new thread, though.) I'm still not sure the way the speculative locking works is the best approach. Instead of clearing xmin on super-deletion, a new flag on the heap tuple seems more straightforward. And we could put the speculative insertion token in t_ctid, instead of stashing it in the PGPROC array. That would again seem more straightforward. A couple of quick random comments: > /* > * plan_speculative_use_index > * Use the planner to decide speculative insertion arbiter index > * > * Among indexes on target of INSERT ... ON CONFLICT, decide which index to > * use to arbitrate taking alternative path. This should be called > * infrequently in practice, because it's unusual for more than one index to > * be available that can satisfy a user-specified unique index inference > * specification. > * > * Note: caller had better already hold some type of lock on the table. > */ > Oid > plan_speculative_use_index(PlannerInfo *root, List *indexList) > { > ... > /* Locate cheapest IndexOptInfo for the target index */ If I'm reading this correctly, if there are multiple indexes that match the unique index inference specification, we pick the cheapest one. Isn't that unstable? Two backends running the same INSERT ON CONFLICT statement might pick different indexes, and the decision might change over time as the table is analyzed. I think we should have a more robust rule. Could we easily just use all matching indexes? > ... Deferred unique constraints are not supported as > + arbiters of whether an alternative <literal>ON CONFLICT</> path > + should be taken. We really need to find a shorter term for "arbiter of whether an alternative path should be taken". Different variations of that term are used a lot, and it's tedious to read. > * There is still an unresolved semantics issue with unique index > inference and non-default opclasses. I think it's sufficient that the > available/defined unique indexes dictate our idea of a unique > violation (that necessitates taking the alternative path). Even in a > world where there exists a non-default opclass with an "equals" > operator that does not match that of the default opclass (that is not > really the world we live in, because the only counter-example known is > made that way specifically to *discourage* its use by users), this > seems okay to me. It seems okay to me because surely the relevant > definition of equality is the one actually chosen for the available > unique index. If there exists an ambiguity for some strange reason > (i.e. there are two unique indexes, on the same attribute(s), but with > different "equals" operators), then its a costing issue, so the > behavior given is essentially non-predictable (it could end up being > either...but hey, those are the semantics). I have a very hard time > imagining how that could ever be the case, even when we have (say) > case insensitive opclasses for the text type. A case insensitive > opclass is stricter than a case sensitive opclass. Why would a user > ever want both on the same attribute(s) of the same table? Is the user > really more or less expecting to never get a unique violation on the > non-arbitrating unique index, despite all this? > > If reviewers are absolutely insistent that this theoretical ambiguity > is a problem, we can add an optional CREATE INDEX style opclass > specification (I'm already using the IndexElems representation from > CREATE INDEX for the inference specification, actually, so that would > be easy enough). I really have a hard time believing that the ugliness > is a problem for those hypothetical users that eventually consider > "equals" operator ambiguity among opclasses among available unique > indexes to be a problem. I haven't just gone and implemented this > already because I didn't want to document that an opclass > specification will be accepted. Since there is literally no reason why > anyone would care today, I see no reason to add what IMV would really > just be cruft. I've been thinking that it would be nice to be able to specify a constraint name. Naming an index directly feels wrong, as in relational and SQL philosophy, indexes are just an implementation detail, but naming a constraint is a fair game. It would also be nice to be able to specify "use the primary key". Attached patch contains a few more things I saw at a quick read. - Heikki
Attachment
On Thu, Mar 5, 2015 at 3:44 PM, Peter Geoghegan <pg@heroku.com> wrote: > Bruce Momjian kindly made available a server for stress-testing [1]. > I'm using jjanes_upsert for this task (I stopped doing this for a > little while, and was in need of a new server). BTW, this was run for about another week on Bruce's server, and no further issues arose affecting either the B-Tree or exclusion constraint implementations. I've stopped with it for now, because it feels unlikely that I'm going to find any more bugs this way. -- Peter Geoghegan
On Tue, Mar 17, 2015 at 12:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I'm still not sure the way the speculative locking works is the best > approach. Instead of clearing xmin on super-deletion, a new flag on the heap > tuple seems more straightforward. And we could put the speculative insertion > token in t_ctid, instead of stashing it in the PGPROC array. That would > again seem more straightforward. I see the appeal of that approach. What concerns me about that approach is that it makes it the problem of the inserter to confirm its insertion, even though overwhelmingly, speculative insertions succeeds (the race window is small due to the pre-check). The current speculative token is legitimately a very short lived thing, a thing that I hesitate to write to disk at all. So our optimistic approach to inserting speculatively loses its optimism, which I don't like, if you know what I mean. OTOH, apart from avoiding stashing things in PGPROC, I do see another advantage to what you outline. Doing things this way (and making the insertion and confirmation of a speculative insertion a two-phased thing) unambiguously fixes all problems with logical decoding, without adding unexpected cross-change-coordination tricks during transaction reassembly, and really without much further thought. All we do is get a new case to decode, that ultimately produces a REORDER_BUFFER_CHANGE_INSERT change, which Andres more or less wanted to do anyway. Under this scheme with using t_ctid, heap_insert() would do minimal WAL logging, necessary for the same reasons that some WAL logging is required within heap_lock_tuple() (so the new record type is very similar to the existing, simple xl_heap_lock record type). We'd use one of the two remaining bits within t_infomask2 for this, so that waiters will know to interpret t_ctid as a speculative insertion token (and then wait on it). Then, after speculative insertion succeeded, we'd WAL log something closer to an UPDATE to confirm that insertion was in fact successful, which is where the contents of the new tuple is actually finally WAL-logged (like UPDATEs used to be, but without a new tuple being WAL-logged at all, since there of course is none). Is that more or less what you have in mind? > A couple of quick random comments: > >> /* >> * plan_speculative_use_index >> * Use the planner to decide speculative insertion arbiter >> index >> * >> * Among indexes on target of INSERT ... ON CONFLICT, decide which index >> to >> * use to arbitrate taking alternative path. This should be called >> * infrequently in practice, because it's unusual for more than one index >> to >> * be available that can satisfy a user-specified unique index inference >> * specification. >> * >> * Note: caller had better already hold some type of lock on the table. >> */ >> Oid >> plan_speculative_use_index(PlannerInfo *root, List *indexList) >> { >> ... >> /* Locate cheapest IndexOptInfo for the target index */ > > > If I'm reading this correctly, if there are multiple indexes that match the > unique index inference specification, we pick the cheapest one. Isn't that > unstable? Two backends running the same INSERT ON CONFLICT statement might > pick different indexes, and the decision might change over time as the table > is analyzed. I think we should have a more robust rule. Could we easily just > use all matching indexes? Robert feel pretty strongly that there should be a costing aspect to this. Initially I was skeptical of this, but just did what he said all the same. But I've since come around to his view entirely because we now support partial unique indexes. You can add a predicate to a unique index inference specification, and you're good, even if there is no partial index on those attributes/expressions exactly matching the DML/inference predicate - we use predicate_implied_by() for this, which works with an equivalent non-partial unique index. This is because a non-partial unique index covers those attributes sufficiently. There may be value in preferring the cheap partial index, and then verifying that it actually did cover the final inserted tuple version (which is how things work now). If we cannot discriminate against one particular unique index, making sure it covers the inserted heap tuple (by throwing a user-visible ERRCODE_TRIGGERED_ACTION_EXCEPTION error if it doesn't, as I currently do within ExecInsertIndexTuples()) is on shaky ground as a user-visible behavior. I like that behavior, though - seems less surprising than letting a failure to actually cover a selective partial unique index predicate (that of the one and only arbiter index) slide. You can only get this ERRCODE_TRIGGERED_ACTION_EXCEPTION error when you actually had a predicate in your unique index inference specification in the first place, so seems likely that you want the error. But, I also like supporting unique indexes that are non-partial even in the presence of a predicate in the unique index inference specification clause, because, as I said, that's still correct - I can see not doing this breaking queries when a partial unique index is dropped due to a more general uniqueness requirement. Besides, having a list of arbiter unique indexes that must all be equivalent for the user's purpose anyway is very awkward indeed. It seems very confusing. If they all have equivalent definitions anyway, as they must, then it must not matter one bit which one you take (except perhaps on cost grounds). It might be unstable, but it couldn't possibly matter, so why bother with anything else? If it *does* matter on semantics grounds, then we have bigger problems. There is a tension between indexes as an implementation detail, and indexes as the thing that define the semantics for the purposes of this feature here (in a way that isn't baked into the DML statement, but is only known from a cataloged definition of an index or its physical structure). I have not fully resolved that tension here, but I think I have a good practical balance between the two extremes. Does that make sense? >> ... Deferred unique constraints are not supported as >> + arbiters of whether an alternative <literal>ON CONFLICT</> path >> + should be taken. > > > We really need to find a shorter term for "arbiter of whether an alternative > path should be taken". Different variations of that term are used a lot, and > it's tedious to read. Okay. I'll think about that. > I've been thinking that it would be nice to be able to specify a constraint > name. Naming an index directly feels wrong, as in relational and SQL > philosophy, indexes are just an implementation detail, but naming a > constraint is a fair game. It would also be nice to be able to specify "use > the primary key". Now that we have partial unique indexes covered, which never have a constraint (I think - see my remarks above), this is probably okay. But that doesn't really solve the highly theoretical (non-)issue with "equals" operator ambiguity, which I called out as an open issue recently [1]. That's still at thing we need to decide on, even if that means that you agree with me that it doesn't matter (and since this naming of constraints in DML is an escape hatch from that problem too, that seems more likely than before). What do you think? > Attached patch contains a few more things I saw at a quick read. This is mostly documentation stuff. I can certainly address this feedback quickly and easily, and will do so soon. [1] http://www.postgresql.org/message-id/CAM3SWZS9DTFt1=sQT=WFdp5UFkOac2Qc1i5WE-Od4BXJZ=JsCw@mail.gmail.com -- Peter Geoghegan
On Mon, Mar 16, 2015 at 8:10 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > (Note there is some bitrot in gram.y that prevents the first patch > from applying cleanly to HEAD) That's trivially fixable. I'll have those fixes in the next revision, once I firm some things up with Heikki. > I tested using the attached script, and one test didn't behave as I > expected. I believe the following should have been a valid upsert > (following the update path) but actually it failed: > > INSERT INTO t1 VALUES (4, 0) ON CONFLICT (a) UPDATE SET b = 1; > > AFAICT, it is applying a WITH CHECK OPTION with qual "b > 0 AND a % 2 > = 0" to the about-to-be-updated tuple (a=4, b=0), which is wrong > because the "b > 0" check (policy p3) should only be applied to the > post-update tuple. > > Possibly I'm missing something though. I think that you may have. Did you read the commit message/docs of the RLS commit 0004-*? You must consider the second point here, I believe: """" The 3 places that RLS policies are enforced are: * Against row actually inserted, after insertion proceeds successfully (INSERT-applicable policies only). * Against row in target table that caused conflict. The implementation is careful not to leak the contents of that row indiagnostic messages (INSERT-applicable *and* UPDATE-applicable policies). * Against the version of the row added by to the relation after ExecUpdate() is called (INSERT-applicable *and* UPDATE-applicablepolicies). """" You're seeing a failure that applies to the target tuple of the UPDATE (the tuple that we can't leak the contents of). I felt it was best to check all policies against the target/existing tuple, including both WITH CHECK OPTIONS and USING quals (which are both enforced). I can see why you might not like that behavior, but it is the intended behavior. I thought that this whole intersection of RLS + UPSERT is complex enough that it would be best to be almost as conservative as possible in what fails and what succeeds. The one exception is when the insert path is actually taken, since the statement is an INSERT statement. -- Peter Geoghegan
On 17 March 2015 at 23:25, Peter Geoghegan <pg@heroku.com> wrote: >> Possibly I'm missing something though. > > I think that you may have. Did you read the commit message/docs of the > RLS commit 0004-*? You must consider the second point here, I believe: > > """" > The 3 places that RLS policies are enforced are: > > * Against row actually inserted, after insertion proceeds successfully > (INSERT-applicable policies only). > > * Against row in target table that caused conflict. The implementation > is careful not to leak the contents of that row in diagnostic > messages (INSERT-applicable *and* UPDATE-applicable policies). > > * Against the version of the row added by to the relation after > ExecUpdate() is called (INSERT-applicable *and* UPDATE-applicable > policies). > > """" > Yes, I read that, and I agree with the intention to not leak data according to both the INSERT and UPDATE policies, however... > You're seeing a failure that applies to the target tuple of the UPDATE > (the tuple that we can't leak the contents of). I felt it was best to > check all policies against the target/existing tuple, including both > WITH CHECK OPTIONS and USING quals (which are both enforced). > I think that's an incorrect implementation of the RLS UPDATE policy. The WITH CHECK quals of a RLS policy are intended to be applied to the NEW data, not the existing data. This patch is applying the WITH CHECK quals to both the existing and NEW tuples, which runs counter to the way RLS polices are normally enforced, and I think that will just lead to confusion. > I can see why you might not like that behavior, but it is the intended > behavior. I thought that this whole intersection of RLS + UPSERT is > complex enough that it would be best to be almost as conservative as > possible in what fails and what succeeds. The one exception is when > the insert path is actually taken, since the statement is an INSERT > statement. The problem with that is that the user will see errors saying that the data violates the RLS WITH CHECK policy, when they might quite reasonably argue that it doesn't. That's not really being conservative. I'd argue it's a bug. Regards, Dean
On Tue, Mar 17, 2015 at 3:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I've been thinking that it would be nice to be able to specify a constraint > name. Naming an index directly feels wrong, as in relational and SQL > philosophy, indexes are just an implementation detail, but naming a > constraint is a fair game. It would also be nice to be able to specify "use > the primary key". Intuitively, I think you should specify an operator name, not a constraint name. That's what we do for, e.g., exclusion constraints, and it feels right. People sometimes create and drop indexes (and thus, perhaps, the constraints that depend on them) for maintenance reasons where a change in semantics will be unwelcome. But I don't accept Peter's argument that it's OK to be indifferent to which particular equality semantics are being used. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Mar 17, 2015 at 6:27 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Tue, Mar 17, 2015 at 12:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> I'm still not sure the way the speculative locking works is the best >> approach. Instead of clearing xmin on super-deletion, a new flag on the heap >> tuple seems more straightforward. And we could put the speculative insertion >> token in t_ctid, instead of stashing it in the PGPROC array. That would >> again seem more straightforward. > > I see the appeal of that approach. What concerns me about that > approach is that it makes it the problem of the inserter to confirm > its insertion, even though overwhelmingly, speculative insertions > succeeds (the race window is small due to the pre-check). The current > speculative token is legitimately a very short lived thing, a thing > that I hesitate to write to disk at all. So our optimistic approach to > inserting speculatively loses its optimism, which I don't like, if you > know what I mean. Modifying a shared buffer is not the same as writing to disk. >> If I'm reading this correctly, if there are multiple indexes that match the >> unique index inference specification, we pick the cheapest one. Isn't that >> unstable? Two backends running the same INSERT ON CONFLICT statement might >> pick different indexes, and the decision might change over time as the table >> is analyzed. I think we should have a more robust rule. Could we easily just >> use all matching indexes? > > Robert feel pretty strongly that there should be a costing aspect to > this. Initially I was skeptical of this, but just did what he said all > the same. But I've since come around to his view entirely because we > now support partial unique indexes. I think Heikki's concern is something different, although I am not altogether up to speed on this and may be confused. The issue is: suppose that process A and process B are both furiously upserting into the same table with the same key column but, because they have different costing parameters, process A consistently chooses index X and process B consistently chooses index Y. In that situation, will we deadlock, livelock, error out, bloat, or get any other undesirable behavior, or is that A-OK? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 03/18/2015 06:23 PM, Robert Haas wrote: > On Tue, Mar 17, 2015 at 6:27 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Tue, Mar 17, 2015 at 12:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> I'm still not sure the way the speculative locking works is the best >>> approach. Instead of clearing xmin on super-deletion, a new flag on the heap >>> tuple seems more straightforward. And we could put the speculative insertion >>> token in t_ctid, instead of stashing it in the PGPROC array. That would >>> again seem more straightforward. >> >> I see the appeal of that approach. What concerns me about that >> approach is that it makes it the problem of the inserter to confirm >> its insertion, even though overwhelmingly, speculative insertions >> succeeds (the race window is small due to the pre-check). The current >> speculative token is legitimately a very short lived thing, a thing >> that I hesitate to write to disk at all. So our optimistic approach to >> inserting speculatively loses its optimism, which I don't like, if you >> know what I mean. > > Modifying a shared buffer is not the same as writing to disk. AFAICS, there is no need to go and clear the tag after the insert has completed. Here's what I had in mind: the inserter tags the tuple with the speculative insertion token, by storing the token in the t_ctid field. If the inserter needs to super-delete the tuple, it sets xmax like in a regular deletion, but also sets another flag to indicate that it was a super-deletion. When another backend inserts, and notices that it has a potential conflict with the first tuple, it tries to acquire a hw-lock on the token. In most cases, the inserter has long since completed the insertion, and the acquisition succeeds immediately but you have to check because the token is not cleared on a completed insertion. It would be slightly faster for the second backend if the inserter actually removed the token after the insertion has completed: it wouldn't need to check the lock if the tuple was not tagged in the first place. But that'd be just a tiny optimization, for the rare case that there is a conflict, and surely not worth it. Hmm, I just realized a little fly in that ointment: if the inserter inserts enough tuples to wrap around the token counter (4 billion?) in a single transaction, another backend that inserts could confuse a new speculative insertion with one that completed a long time ago. The risk seems small enough that we could just accept it. I don't think anything particularly bad would happen on a false positive here. Or we could also use all 48 bits available in the t_ctid to push it truly in the realm of ain't-gonna-happen. Or we could use (relfilenode, block, token) as the lock tag for the SpeculativeInsertionLock. Regarding the physical layout: We can use a magic OffsetNumber value above MaxOffsetNumber to indicate that the t_ctid field stores a token rather than a regular ctid value. And another magic t_ctid value to indicate that a tuple has been super-deleted. The token and the super-deletion flag are quite ephemeral, they are not needed after the inserting transaction has completed, so it's nice to not consume the valuable infomask bits for these things. Those states are conveniently not possible on an updated tuple, when we would need the t_ctid field for it's current purpose. >>> If I'm reading this correctly, if there are multiple indexes that match the >>> unique index inference specification, we pick the cheapest one. Isn't that >>> unstable? Two backends running the same INSERT ON CONFLICT statement might >>> pick different indexes, and the decision might change over time as the table >>> is analyzed. I think we should have a more robust rule. Could we easily just >>> use all matching indexes? >> >> Robert feel pretty strongly that there should be a costing aspect to >> this. Initially I was skeptical of this, but just did what he said all >> the same. But I've since come around to his view entirely because we >> now support partial unique indexes. > > I think Heikki's concern is something different, although I am not > altogether up to speed on this and may be confused. The issue is: > suppose that process A and process B are both furiously upserting into > the same table with the same key column but, because they have > different costing parameters, process A consistently chooses index X > and process B consistently chooses index Y. In that situation, will > we deadlock, livelock, error out, bloat, or get any other undesirable > behavior, or is that A-OK? Right, that's what I had in mind. - Heikki
On Wed, Mar 18, 2015 at 11:41 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > AFAICS, there is no need to go and clear the tag after the insert has > completed. > > Here's what I had in mind: the inserter tags the tuple with the speculative > insertion token, by storing the token in the t_ctid field. If the inserter > needs to super-delete the tuple, it sets xmax like in a regular deletion, > but also sets another flag to indicate that it was a super-deletion. > > When another backend inserts, and notices that it has a potential conflict > with the first tuple, it tries to acquire a hw-lock on the token. In most > cases, the inserter has long since completed the insertion, and the > acquisition succeeds immediately but you have to check because the token is > not cleared on a completed insertion. It would be slightly faster for the > second backend if the inserter actually removed the token after the > insertion has completed: it wouldn't need to check the lock if the tuple was > not tagged in the first place. But that'd be just a tiny optimization, for > the rare case that there is a conflict, and surely not worth it. > > Hmm, I just realized a little fly in that ointment: if the inserter inserts > enough tuples to wrap around the token counter (4 billion?) in a single > transaction, another backend that inserts could confuse a new speculative > insertion with one that completed a long time ago. The risk seems small > enough that we could just accept it. I don't think anything particularly bad > would happen on a false positive here. Or we could also use all 48 bits > available in the t_ctid to push it truly in the realm of ain't-gonna-happen. > Or we could use (relfilenode, block, token) as the lock tag for the > SpeculativeInsertionLock. Maybe we'd use all 48-bits anyway, but it's not a major concern either way. Speculative token locks (by design) are only held for an instant. I think a spurious wait would be entirely benign, and very unlikely. > Regarding the physical layout: We can use a magic OffsetNumber value above > MaxOffsetNumber to indicate that the t_ctid field stores a token rather than > a regular ctid value. And another magic t_ctid value to indicate that a > tuple has been super-deleted. The token and the super-deletion flag are > quite ephemeral, they are not needed after the inserting transaction has > completed, so it's nice to not consume the valuable infomask bits for these > things. Those states are conveniently not possible on an updated tuple, when > we would need the t_ctid field for it's current purpose. You seem pretty convinced that this is the way to go. I'll try and produce a revision that works this way shortly. I don't think that it will be all that hard to come up with something to prove the idea out. >> I think Heikki's concern is something different, although I am not >> altogether up to speed on this and may be confused. The issue is: >> suppose that process A and process B are both furiously upserting into >> the same table with the same key column but, because they have >> different costing parameters, process A consistently chooses index X >> and process B consistently chooses index Y. In that situation, will >> we deadlock, livelock, error out, bloat, or get any other undesirable >> behavior, or is that A-OK? > > Right, that's what I had in mind. Oh, I see. I totally failed to understand that that was the concern. I think it'll be fine. The pre-check is going to look for a heap tuple using one or the other of (say) a pair of equivalent indexes. We might miss the heap tuple because we picked an index that had yet to have a physical index tuple inserted, and then hit a conflict on optimistic insertion (the second phase). But there is no reason to think that that won't happen anyway. The ordering of operations isn't critical. The one issue you might still have is a duplicate violation, because you happened to infer the unique index that does not get to tolerate unique violations as conflicts that can be recovered from (and there was a race, which is unlikely). I don't really care about this, though. You really are inferring one particular unique index, and for reasons like this I think it would be a mistake to try to pretend that the unique index is 100% an implementation detail. That's why I called the new clause a unique index inference specification. This hypothetical set of unique indexes will always have n-1 redundant unique indexes - they must closely match. That's something that calls into question why the user wants things this way to begin with. Also, note that one unique index will consistently "win", since the insertion order is stable (the relcache puts them in OID order). So it will not be all over the map. -- Peter Geoghegan
On Wed, Mar 18, 2015 at 9:19 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Mar 17, 2015 at 3:11 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> I've been thinking that it would be nice to be able to specify a constraint >> name. Naming an index directly feels wrong, as in relational and SQL >> philosophy, indexes are just an implementation detail, but naming a >> constraint is a fair game. It would also be nice to be able to specify "use >> the primary key". > > Intuitively, I think you should specify an operator name, not a > constraint name. That's what we do for, e.g., exclusion constraints, > and it feels right. People sometimes create and drop indexes (and > thus, perhaps, the constraints that depend on them) for maintenance > reasons where a change in semantics will be unwelcome. I think we should use a constraint name. That is the plain reality of what we're doing, and is less ambiguous than an operator. 99% of the time you'll be happy with an unspecified, across-the-board IGNORE, or won't be using exclusion constraints anyway (so we can infer a unique index). A constraint name covers all reasonable cases, since partial unique indexes are otherwise covered (partial unique indexes do not have a pg_constraint entry). Oracle has a hint for ignoring particular, named unique indexes (not constraints). I realize that Oracle hints are not supposed to affect semantics, but this is actually true (Google it). This is a bit ugly, but less ugly as the hint, since as Heikki points out we're only naming a constraint. Naming a constraint reflects the reality of how the feature needs to work, and has a sort of precedent from other systems. > But I don't > accept Peter's argument that it's OK to be indifferent to which > particular equality semantics are being used. I am not suggesting actual indifference makes sense. I am leaving it up to the definition of available indexes. And there are no known cases where it could matter anyway, unless you consider the === operator for tuples to be a counter example. And you need multiple conflicting unique indexes on the exact same attributes/expressions on attributes to begin with. Isn't that a highly esoteric thing to have to worry about? Perhaps to the extent that literally no one will ever have to care anyway? It's an oddball use-case, if ever I saw one. Note: the issue of caring about equality semantics across B-Tree opclasses of the same type, and the issue of naming unique indexes are separate issues, AFAICT. No one should confuse them. The only crossover is that the oddball use-case mentioned could use the named constraint thing as an escape hatch. As I've said, I think it's misguided to try to make unique indexes 100% an implementation detail. It's going to fall apart in edge cases, like the one with multiple unique indexes that I mentioned in my last e-mail. No one thinks of them that way, including users. -- Peter Geoghegan
On Wed, Mar 18, 2015 at 3:40 PM, Peter Geoghegan <pg@heroku.com> wrote: >>> I think Heikki's concern is something different, although I am not >>> altogether up to speed on this and may be confused. The issue is: >>> suppose that process A and process B are both furiously upserting into >>> the same table with the same key column but, because they have >>> different costing parameters, process A consistently chooses index X >>> and process B consistently chooses index Y. In that situation, will >>> we deadlock, livelock, error out, bloat, or get any other undesirable >>> behavior, or is that A-OK? >> >> Right, that's what I had in mind. > > Oh, I see. I totally failed to understand that that was the concern. > > I think it'll be fine. The pre-check is going to look for a heap tuple > using one or the other of (say) a pair of equivalent indexes. We might > miss the heap tuple because we picked an index that had yet to have a > physical index tuple inserted, and then hit a conflict on optimistic > insertion (the second phase). But there is no reason to think that > that won't happen anyway. The ordering of operations isn't critical. > > The one issue you might still have is a duplicate violation, because > you happened to infer the unique index that does not get to tolerate > unique violations as conflicts that can be recovered from (and there > was a race, which is unlikely). I don't really care about this, > though. You really are inferring one particular unique index, and for > reasons like this I think it would be a mistake to try to pretend that > the unique index is 100% an implementation detail. That's why I called > the new clause a unique index inference specification. > > This hypothetical set of unique indexes will always have n-1 redundant > unique indexes - they must closely match. That's something that calls > into question why the user wants things this way to begin with. Also, > note that one unique index will consistently "win", since the > insertion order is stable (the relcache puts them in OID order). So it > will not be all over the map. I think this is pretty lousy. The reasons why the user wants things that way is because they created a UNIQUE index and it got bloated somehow with lots of dead tuples. So they made a new UNIQUE index on the same column and then they're planning to do a DROP INDEX CONCURRENTLY on the old one, which is maybe even now in progress. And now they start getting duplicate key failures, the avoidance of which was their whole reason for using UPSERT in the first place. If I were that user, I'd report that as a bug, and if someone told me that it was intended behavior, I'd say "oh, so you deliberately designed this feature to not work some of the time?". ISTM that we need to (1) decide which operator we're using to compare and then (2) tolerate conflicts in every index that uses that operator. In most cases there will only be one, but if there are more, so be it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Mar 19, 2015 at 1:02 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I think this is pretty lousy. The reasons why the user wants things > that way is because they created a UNIQUE index and it got bloated > somehow with lots of dead tuples. So they made a new UNIQUE index on > the same column and then they're planning to do a DROP INDEX > CONCURRENTLY on the old one, which is maybe even now in progress. And > now they start getting duplicate key failures, the avoidance of which > was their whole reason for using UPSERT in the first place. If I were > that user, I'd report that as a bug, and if someone told me that it > was intended behavior, I'd say "oh, so you deliberately designed this > feature to not work some of the time?". > > ISTM that we need to (1) decide which operator we're using to compare > and then (2) tolerate conflicts in every index that uses that > operator. In most cases there will only be one, but if there are > more, so be it. On reflection, I see your point. I'll try and do something about this too. -- Peter Geoghegan
On Wed, Mar 18, 2015 at 2:41 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Here's what I had in mind: the inserter tags the tuple with the speculative > insertion token, by storing the token in the t_ctid field. If the inserter > needs to super-delete the tuple, it sets xmax like in a regular deletion, > but also sets another flag to indicate that it was a super-deletion. I was able to quickly hack up a prototype of this in my hotel room at pgConf.US. It works fine at first blush, passing the jjanes_upsert stress tests and my own regression tests without a problem. Obviously it needs more testing and clean-up before posting, but I was pleased with how easy this was. > When another backend inserts, and notices that it has a potential conflict > with the first tuple, it tries to acquire a hw-lock on the token. In most > cases, the inserter has long since completed the insertion, and the > acquisition succeeds immediately but you have to check because the token is > not cleared on a completed insertion. You don't even have to check/take a ShareLock on the token when the other xact committed/aborted, because you know that if it is there, then based on that (and based on the fact that it wasn't super deleted) the tuple is visible/committed, or (in the event of other-xact-abort) not visible/aborted. In other words, we continue to only check for a speculative token when the inserting xact is in flight - we just take the token from the heap now instead. Not much needs to change, AFAICT. > Regarding the physical layout: We can use a magic OffsetNumber value above > MaxOffsetNumber to indicate that the t_ctid field stores a token rather than > a regular ctid value. And another magic t_ctid value to indicate that a > tuple has been super-deleted. The token and the super-deletion flag are > quite ephemeral, they are not needed after the inserting transaction has > completed, so it's nice to not consume the valuable infomask bits for these > things. Those states are conveniently not possible on an updated tuple, when > we would need the t_ctid field for it's current purpose. Haven't done anything about this yet. I'm just using an infomask2 bit for now. Although that was only because I forgot that you suggested this before having a go at implementing this new t_ctid scheme! My next revision will have a more polished version of this scheme. I'm not going to immediately act on Robert's feedback elsewhere (although I'd like to), owing to time constraints - no reason to deny you the opportunity to review the entirely unrelated low-level speculative locking mechanism due to that. -- Peter Geoghegan
On Wed, Mar 25, 2015 at 12:42 PM, Peter Geoghegan <pg@heroku.com> wrote: > My next revision will have a more polished version of this scheme. I'm > not going to immediately act on Robert's feedback elsewhere (although > I'd like to), owing to time constraints - no reason to deny you the > opportunity to review the entirely unrelated low-level speculative > locking mechanism due to that. Attached revision, V3.1, implements this slightly different way of figuring out if an insertion is speculative, as discussed. We reuse t_ctid for speculatively inserted tuples. That's where the counter goes. I think that this is a significant improvement, since there is no longer any need to touch the proc array for any reason, without there being any significant disadvantage that I'm aware of. I also fixed some bitrot, and a bug with index costing (the details aren't terribly interesting - tuple width wasn't being calculated correctly). I have worked through Heikki's feedback on documentation and code tweaks, too (they were mostly just documentation changes). Notably, I have not: * Added ON CONFLICT PRIMARY KEY (might get to this later) * Otherwise altered the inference specification. I have not allowed it to name a constraint, which Heikki and I favor (Robert wanted to name an operator directly). Again, just haven't got around to it. * Done anything new with logical decoding. My handling of conflicts during transaction reassembly remains questionable. I hope this can be worked out soon, possibly with input from Andres. He is sitting next to me at pgConf.US, so maybe we can work something out in person. * Addressed the concerns held by Heikki and Robert about multiple equivalent unique indexes. I confirmed through testing that this can cause unique violations that are arguably spurious. It isn't too bad, though - with a couple of unique indexes, jjanes_upsert requires high concurrency (i.e. 128 clients) in order to see one or two or three such unique violations after a minute or so. But even still, let's fix it. As I mentioned, I am at a conference, and obviously there are other time pressures, so I haven't put as much testing into this revision as I'd like. I thought that under the circumstances it was preferable to post what I have sooner rather than later, though. -- Peter Geoghegan
Attachment
On 03/26/2015 08:00 PM, Peter Geoghegan wrote: > On Wed, Mar 25, 2015 at 12:42 PM, Peter Geoghegan <pg@heroku.com> wrote: >> My next revision will have a more polished version of this scheme. I'm >> not going to immediately act on Robert's feedback elsewhere (although >> I'd like to), owing to time constraints - no reason to deny you the >> opportunity to review the entirely unrelated low-level speculative >> locking mechanism due to that. > > Attached revision, V3.1, implements this slightly different way of > figuring out if an insertion is speculative, as discussed. We reuse > t_ctid for speculatively inserted tuples. That's where the counter > goes. I think that this is a significant improvement, since there is > no longer any need to touch the proc array for any reason, without > there being any significant disadvantage that I'm aware of. I also > fixed some bitrot, and a bug with index costing (the details aren't > terribly interesting - tuple width wasn't being calculated correctly). Cool. Quickly looking at the patch though - does it actually work as it is? RelationPutHeapTuple will overwrite the ctid field when the tuple is put on the page, so I don't think the correct token will make it to disk as the patch stands. Also, there are a few places where we currently check if t_ctid equals the tuple's location, and try to follow t_ctid if it doesn't. I think those need to be taught that t_ctid can also be a token. - Heikki
On Thu, Mar 26, 2015 at 2:51 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> Attached revision, V3.1, implements this slightly different way of >> figuring out if an insertion is speculative, as discussed. We reuse >> t_ctid for speculatively inserted tuples. That's where the counter >> goes. I think that this is a significant improvement, since there is >> no longer any need to touch the proc array for any reason, without >> there being any significant disadvantage that I'm aware of. I also >> fixed some bitrot, and a bug with index costing (the details aren't >> terribly interesting - tuple width wasn't being calculated correctly). > > Cool. Quickly looking at the patch though - does it actually work as it is? The test cases pass, including jjanes_upsert, and stress tests that test for unprincipled deadlocks. But yes, I am entirely willing to believe that something that was written in such haste could be broken. My manual testing was pretty minimal. Sorry for posting a shoddy patch, but I thought it was more important to show you that this is perfectly workable ASAP. > RelationPutHeapTuple will overwrite the ctid field when the tuple is put on > the page, so I don't think the correct token will make it to disk as the > patch stands. Oops - You're right. I find it interesting that this didn't result in breaking my test cases. I guess that not having proc array locking might have made the difference with unprincipled deadlocks, which I could not recreate (and row locking saves us from breaking UPSERT, I think - although if so the token lock would still certainly be needed for the IGNORE variant). It is interesting that this wasn't obviously broken for UPSERT, though. I think it at least suggests that when testing, we need to be more careful with taking a working UPSERT as a proxy for a working ON CONFLICT IGNORE. > Also, there are a few places where we currently check if > t_ctid equals the tuple's location, and try to follow t_ctid if it doesn't. > I think those need to be taught that t_ctid can also be a token. I did fix at least some of those. I thought that the choke point for doing that was fairly small, entirely confined to one or two routines with heapam.c. But it would surely be better to follow your suggestion of using an invalid/magic tuple offset value to be sure that it cannot possibly occur elsewhere. And I'm still using that infomask2 bit, which is probably not really necessary. So that needs to change too. -- Peter Geoghegan
On 27/03/15 09:14, Peter Geoghegan wrote: > On Thu, Mar 26, 2015 at 2:51 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: [...] > Oops - You're right. I find it interesting that this didn't result in > breaking my test cases. > [...] Reminds of the situation where I got an A++ for a COBOL programming assignment that successfully handled the test data provided - then I found a major bug when 'idly' reviewing my code! The lecturer (also a highly experienced developer) was amused when I pointed it out to her, and she said I still deserved the A++! Cheers, Gavin
Hi, Just had a longer chat with Peter about this patch. * Not a fan of the heap flags usage, the reusage seems sketch to me * Explain should show the arbiter index in text as well * AddUniqueSpeculative is a bad name, it should refer IndexInfo * Work on the ExecInsert() comments * Let's remove the planner choosing the "cheapest" arbiter index; it should do all. * s/infer_unique_index/infer_arbiter_index/ * Not supporting inheritance properly makes me uncomfortable. I don't think users will think that's a acceptable/reasonablerestriction. * Let's not use t_ctid directly, but add a wrapper * The code uses LockTupleExclusive to lock rows. That means the fkey key locking doesn't work; That's annoying. This meansthat using upsert will potentially cause deadlocks in other sessions :(. I think you'll have to determine what lockto acquire by fetching the tuple, doing the key comparison, and acquire the appropriate lock. That should be fine. * I think we should decouple the insertion and wal logging more. I think the promise tuple insertion should be differentfrom the final insertion of the actual tuple. For one it seems cleaner to me, for another it will avoid the uglynessaround logical decoding. I think also that the separation will make it more realistic to use something like thisfor a COPY variant that doesn't raise unique violations and such. * We discussed the infererence and that it should just reuse (code, grammar, docs) the column specification from create index. * Some more stuff I don't recall. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Mar 28, 2015 at 6:36 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Just had a longer chat with Peter about this patch. It was a very useful chat. Thanks for making yourself available to do it. > * Not a fan of the heap flags usage, the reusage seems sketch to me > * Explain should show the arbiter index in text as well > * AddUniqueSpeculative is a bad name, it should refer IndexInfo > * Work on the ExecInsert() comments > * Let's remove the planner choosing the "cheapest" arbiter index; it > should do all. > * s/infer_unique_index/infer_arbiter_index/ OK. > * Not supporting inheritance properly makes me uncomfortable. I don't > think users will think that's a acceptable/reasonable restriction. I'll look into making the inference specification deduce a child relation index. > * Let's not use t_ctid directly, but add a wrapper We talked about a union. This seems quite doable. > * The code uses LockTupleExclusive to lock rows. That means the fkey key > locking doesn't work; That's annoying. This means that using upsert > will potentially cause deadlocks in other sessions :(. I think you'll > have to determine what lock to acquire by fetching the tuple, doing > the key comparison, and acquire the appropriate lock. That should be > fine. Looking into the implementation of this. As we discussed, the difficulty about avoiding a lock escalation within ExecUpdate() is that we must fetch the row, run EvalPlanQual() to check if the new row version generated by updating will require a LockTupleExclusive or LockTupleNoKeyExclusive, and then lock the row using the right lockmode, and only then call ExecUpdate(). Right now, UPSERT benefits from fetching and locking the row together, so going this way imposes a little additional complexity. It's probably worth it, though. > * I think we should decouple the insertion and wal logging more. I think > the promise tuple insertion should be different from the final > insertion of the actual tuple. For one it seems cleaner to me, for > another it will avoid the uglyness around logical decoding. I think > also that the separation will make it more realistic to use something > like this for a COPY variant that doesn't raise unique violations and > such. Your COPY argument swung this for me. I'm looking into the implementation. > * We discussed the infererence and that it should just reuse (code, > grammar, docs) the column specification from create index. Agreed. -- Peter Geoghegan
On 03/30/2015 07:20 PM, Peter Geoghegan wrote: > >> >* I think we should decouple the insertion and wal logging more. I think >> > the promise tuple insertion should be different from the final >> > insertion of the actual tuple. For one it seems cleaner to me, for >> > another it will avoid the uglyness around logical decoding. I think >> > also that the separation will make it more realistic to use something >> > like this for a COPY variant that doesn't raise unique violations and >> > such. > Your COPY argument swung this for me. I'm looking into the implementation. I'm pretty sceptical of that. ISTM you'll need to do modify the page twice for each insertion, first to insert the promise tuple, and then to turn the promise tuple into a real tuple. And WAL-log both updates. That's going to hurt performance. To recover COPY from unique violations, you can just do the same as INSERT ON CONFLICT IGNORE does, and super-delete the inserted tuple on conflict. To recover from any random error, you'll need to abort the (sub)transaction anyway, and I don't see how it helps to separate the insertion of the promise tuple and the "finalization" of the insertion. - Heikki
On Tue, Mar 31, 2015 at 1:09 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I'm pretty sceptical of that. ISTM you'll need to do modify the page twice > for each insertion, first to insert the promise tuple, and then to turn the > promise tuple into a real tuple. And WAL-log both updates. That's going to > hurt performance. Andres' wish to do things that way is at least partially motivated by having logical decoding just work. The co-ordination I'm currently doing across changes within transaction reassembly is pretty ugly. Andres has strongly suggested that it's broken, too, since a snapshot change could occur between a speculative insertion and its super deletion within transaction resassembly, thus invalidating the assumption that the next change not being a super deletion means there is no such super deletion change (i.e. the insert should be treated as "real"). Anyway, if we don't do this, we'll need to make sure my changes to transaction reassembly are sound. Hopefully that's an easy fix. -- Peter Geoghegan
On Tue, Mar 31, 2015 at 2:26 PM, Peter Geoghegan <pg@heroku.com> wrote: > Andres' wish to do things that way is at least partially motivated by > having logical decoding just work. I should add that there appears to be some need to terminate the loop of speculative token waiting. By that I mean that since we're not looking at the proc array to get a speculative token from HeapTupleSatisfiesDirty() now, there is a livelock hazard. That goes away when the speculative inserter cleans up after itself, as Andres proposed. It would also go away if any speculative waiter cleaned up after the inserter, which you suggested (that would be kind of invasive to places like _bt_doinsert(), though). Finally, it would also work if HeapTupleSatisfiesDirty() tested if the token was still held directly, before reporting a speculative token, by for example attempting to briefly acquire a ShareLock on the token (but that would mean that the extra lock acquisition would be required unless and until someone updated that originally-speculative tuple, in doing so finally changing its t_ctid). I think that we definitely have to do something like this, in any case. Maybe just have SpeculativeTokenWait deal with the clean up is cleanest, if we're not going to have inserters clean-up after themselves immediately per Andres' suggestion. -- Peter Geoghegan
On Mon, Mar 30, 2015 at 9:20 AM, Peter Geoghegan <pg@heroku.com> wrote: >> * The code uses LockTupleExclusive to lock rows. That means the fkey key >> locking doesn't work; That's annoying. This means that using upsert >> will potentially cause deadlocks in other sessions :(. I think you'll >> have to determine what lock to acquire by fetching the tuple, doing >> the key comparison, and acquire the appropriate lock. That should be >> fine. > > Looking into the implementation of this. As we discussed, the > difficulty about avoiding a lock escalation within ExecUpdate() is > that we must fetch the row, run EvalPlanQual() to check if the new row > version generated by updating will require a LockTupleExclusive or > LockTupleNoKeyExclusive, and then lock the row using the right > lockmode, and only then call ExecUpdate(). Right now, UPSERT benefits > from fetching and locking the row together, so going this way imposes > a little additional complexity. It's probably worth it, though. Why do you think deadlocks will be a particular concern? Sure, it could make the difference between deadlocking and not deadlocking, which is bad, but it's not obvious to me that the risk would be any worse than the risk of deadlocking with FKs in 9.2. Is that really all that bad? -- Peter Geoghegan
I attach a revision - V3.2 On Mon, Mar 30, 2015 at 9:20 AM, Peter Geoghegan <pg@heroku.com> wrote: >> * Not supporting inheritance properly makes me uncomfortable. I don't >> think users will think that's a acceptable/reasonable restriction. Inheritance is now supported to the greatest extent anyone could reasonably expect. In other words, inference works just fine for child tables, and now parent tables too (so you can UPSERT both). Of course, because of the existing restriction on making unique constraints cross tables participating in inheritance, UPSERT similarly cannot work across inheritance tables. This should be totally obvious to anyone that uses inheritance - if you won't get a conflict (duplicate violation) based on an ordinary insert, then of course UPSERT will not take the alternative UPDATE path just because someone imagines that a (would-be) duplicate violation should happen. >> * Let's not use t_ctid directly, but add a wrapper > > We talked about a union. This seems quite doable. This now uses a union. And it now actually stores a token value! >> * The code uses LockTupleExclusive to lock rows. That means the fkey key >> locking doesn't work; That's annoying. This means that using upsert >> will potentially cause deadlocks in other sessions :(. I think you'll >> have to determine what lock to acquire by fetching the tuple, doing >> the key comparison, and acquire the appropriate lock. That should be >> fine. > > Looking into the implementation of this. Not quite sold on this, on second thought (although let's focus on the WAL logging stuff - the immediate blocker to committing the IGNORE variant). Perhaps you can explain why you think it's important. I like that I am able to fully lock the row when the predicate isn't passed. I think that's a useful feature in some cases (it particularly makes sense for higher isolation levels that expect to repeat the same command and not get a serialization failure). It also keeps the already complicated function ExecLockUpdateTuple() significantly more simple. >> * I think we should decouple the insertion and wal logging more. I think >> the promise tuple insertion should be different from the final >> insertion of the actual tuple. For one it seems cleaner to me, for >> another it will avoid the uglyness around logical decoding. I think >> also that the separation will make it more realistic to use something >> like this for a COPY variant that doesn't raise unique violations and >> such. > > Your COPY argument swung this for me. I'm looking into the implementation. I have a prototype implementation of this with V3.2 - it clearly needs more work, but I thought it was best to post sooner rather than later. I am reusing in-place update infrastructure for this. This should give Heikki something to play with, since he wasn't quite sold on this idea. Certainly, what I have here is not good enough to commit - there is unnecessary double WAL logging of tuple contents, just for example. More generally, my recent changes to heapam.c certainly lack polish. I have something for us to discuss, though, and under the circumstances I think that's a good thing. Grep for "XXX" and "TODO" comments for more. Logical decoding is not handled at all, since I hit a snag with building a tuple from the in-place update WAL records, and I didn't want to block on that (especially given the general uncertainty about if and how to affirm that a speculative insertion succeeded - IWO, if we should go Andres' way there to avoid making transaction reassembly for decoding more messy). I have at least reverted the logical decoding transaction reassembly peek-ahead thing that Andres hated so much, though. I hope we can reach consensus on what to do on this point of WAL logging/logical decoding in particular real soon now. >> * We discussed the infererence and that it should just reuse (code, >> grammar, docs) the column specification from create index. The inference specification now both accepts collation and opclass specifications along the lines we discussed, and can infer multiple unique indexes per Heikki/Robert's complaint (so there is no longer a costing aspect to it at all). There are lots of tests for the inference of collations and opclasses - if you want to know how it works, look at those (e.g. to learn about handling of edge cases with redundant or overlapping cases, perhaps due only to differing collations). I've come up with something very flexible/forgiving, I think. Also, new tests were added for the new inheritance support. The documentation has been updated to reflect all of this. There is still no way to specify a named constraint (which is perhaps useful for exclusion constraints - although perhaps it's good enough to have the IGNORE variant only work with exclusion constraints in the common case where there is no inference specification), or PRIMARY KEY syntax. Still have not made text explain output display arbiter unique indexes (although multiple arbiter unique indexes are now visible from the non-text explain output, since as I mentioned multiple specific unique indexes may now be inferred). This needs some more tweaking. It was helpful with the new tests for inference (with complex variations is collations and opclasses, and multiple unique indexes inferred in a number of cases). I've been meaning to revisit Dean Rasheed's recent remarks on RLS. But that hasn't happened yet. Thanks -- Peter Geoghegan
Attachment
On Tue, Apr 7, 2015 at 10:59 PM, Peter Geoghegan <pg@heroku.com> wrote: > The documentation has been updated to reflect all of this. By the way, for the convenience of reviewers I continue to maintain a mirror of pre-built documentation as outlined here: https://wiki.postgresql.org/wiki/UPSERT#Documentation -- Peter Geoghegan
On Wed, Mar 18, 2015 at 2:59 AM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > Yes, I read that, and I agree with the intention to not leak data > according to both the INSERT and UPDATE policies, however... > >> You're seeing a failure that applies to the target tuple of the UPDATE >> (the tuple that we can't leak the contents of). I felt it was best to >> check all policies against the target/existing tuple, including both >> WITH CHECK OPTIONS and USING quals (which are both enforced). >> > > I think that's an incorrect implementation of the RLS UPDATE policy. > The WITH CHECK quals of a RLS policy are intended to be applied to the > NEW data, not the existing data. This patch is applying the WITH CHECK > quals to both the existing and NEW tuples, which runs counter to the > way RLS polices are normally enforced, and I think that will just lead > to confusion. The big idea (the fine details of which Stephen appeared to be in tentative agreement with [1]) is that an UPSERT is a hybrid between an INSERT and an UPDATE, and not simply an INSERT and separate UPDATE tied together. So at the very least the exact behavior of such a hybrid cannot be assumed to be any particular way just from generalizing from known behaviors for INSERT and UPDATE (which is a usability issue, since the fine details of RLS are already complex enough without UPSERT). The INSERT policies are only enforced when a tuple is inserted because, when the alternative path isn't taken then it's really just an INSERT. For the UPDATE path, where the stickiness/hybridness begins, we enforce the target tuple passes both INSERT policies, and UPDATE policies (including USING quals as WCO). The theory here is that if you're entitled to INSERT it, you ought to be entitled to INSERT the existing tuple in order to take the UPDATE path. And we bunch the UPDATE USING quals + WCO for the sake of (conceptual, not implementation) simplicity - they're already all WCO at that point. Finally, the final tuple (generated using the EXCLUDED and TARGET tuples, from the UPDATE) must pass the UPDATE WCO (including any that originated as USING quals, a distinction that "no longer exists") as well as INSERT policies. If you couldn't INSERT the tuple in the first place (when there wasn't a conflict), then why should you be able to UPSERT it? This is substantively the "same" row, no? You (the user) are tacitly asserting that you don't care about whether the INSERT or UPDATE path is taken anyway, so why should you care? Surely you'd want this to fail as early as possible, rather than leaving it to chance. I really do expect that people are only going to do simple transformations in their UPDATE handler (e.g. "ON CONFLICT UPDATE set count = TARGET.count + EXCLUDED.count"), so in practice it usually doesn't matter. Note that other systems that I looked at don't even support RLS with SQL MERGE at all. So we have no precedent to consider that I'm aware of, other than simply not supporting RLS, which would not be outrageous IMV. I felt, given the ambiguity about how this should differ from ordinary INSERTs + UPDATEs, that something quite restrictive but not entirely restrictive (not supporting RLS, just throwing an error all the time) was safest. In any case I doubt that this will actually come up all that often. > The problem with that is that the user will see errors saying that the > data violates the RLS WITH CHECK policy, when they might quite > reasonably argue that it doesn't. That's not really being > conservative. I'd argue it's a bug. Again, I accept that that's a valid interpretation of it. I have my own opinion, but I will take the path of least resistance on this point. What do other people think? I'd appreciate it if you explicitly outlined what policies you feel should be enforce at each of the 3 junctures within an UPSERT (post INSERT, pre-UPDATE, post-UPDATE). I would also like you to be very explicit about whether or not RLS WITH CHECK policies and USING quals (presumably enforced as RLS WITH CHECK policies) from both INSERT and UPDATE policies should be enforced at each point. In particular, should I ever not treat RLS WCO and USING quals equivalently? (recall that we don't want to elide an UPDATE silently, which makes much less sense for UPSERT...I had assumed that whatever else, we'd always treat WCO and USING quals from UPDATE (and ALL) policies equivalently, but perhaps not). Alternatively, perhaps you'd prefer to state your objection in terms of the exact modifications you'd make to the above outline of the existing behavior. I don't think I totally follow what you're saying yet (which is the problem with being cleverer generally!). It is easy to explain: The insert path is the same as always. Otherwise, both the before and after tuple have all RLS policies (including USING quals) enforced as WCOs. I think that it might be substantially easier to explain that than to explain what you have in mind...let's see. Thanks [1] http://www.postgresql.org/message-id/20150109214041.GK3062@tamriel.snowman.net -- Peter Geoghegan
On Tue, Mar 31, 2015 at 7:02 AM, Peter Geoghegan <pg@heroku.com> wrote: >> Andres' wish to do things that way is at least partially motivated by >> having logical decoding just work. Attached revision, V3.3, adds actual support for ON CONFLICT with logical decoding. I've implemented something along the lines Andres wanted. There is now coordination across WAL records - the insert WAL record, and an in-place update record that affirms that the speculative insertion was successful. This does not involve redundantly logging either the heap tuple header (which only appears in the first of two records), or the tuple contents (which only appears in the second of the two records). I stash the tuple header metadata within the inserting xact's ReorderBufferTXN. This approach appears to be robust. Other changes: * Significant refactoring of infer_arbiter_indexes() within the optimizer. More tests, simpler code. * Reworking of ExecInsert() comments, per feedback in NYC from Andres. Also made the control flow there a little simpler. * ON CONFLICT IGNORE is now fully supported on updatable views - an inference specification can be included. ON CONFLICT UPDATE remains unsupported (this can be revisited in a later release IMV). * Dedicated inference element primnode - it wasn't cool that the CREATE INDEX IndexElem parse node made its way into the optimizer (where unique index inference occurs). This simplified some code, too. * Per discussion in NYC with Andres, arbiter indexes now always appear in explain output (not just for non-text explain output). I've also made the changes to explain.c a lot simpler (I've further isolated the kludge used to display quals from the implementation level sequential scan that can appear in auxiliary UPDATE plans). * Misc polishing. Obsolete comments were found in a few places. Fixed build of contrib/pageinspect, that didn't get the memo about the t_ctid union change. I talked privately with Stephen about RLS. It seems likely that that needs some behavioral changes, but nothing too invasive. I haven't got around to implementing those yet, but I think they're about to make it to the top of my todo list. In any case, that is still split up into a separate commit (I anticipated that it would be a good idea to do that, since the RLS discussion has yet to fully settle down). Nothing has changed about that commit, though. With a concerted effort, I think we can get this over the line for 9.5. Andres: Please take a look at the logical decoding/WAL stuff. Your input on those aspects would be particularly useful now. I'm now pushing code to github regularly. The feature branch in question is: https://github.com/petergeoghegan/postgres/commits/insert_conflict_ignore Thanks -- Peter Geoghegan
Attachment
On 04/15/2015 07:51 AM, Peter Geoghegan wrote: > +heap_finish_speculative(Relation relation, HeapTuple tuple, bool conflict) > +{ > + if (!conflict) > + { > + /* > + * Update the tuple in-place, in the common case where no conflict was > + * detected during speculative insertion. > + * > + * When heap_insert is called in respect of a speculative tuple, the > + * page will actually have a tuple inserted. However, during recovery > + * replay will add an all-zero tuple to the page instead, which is the > + * same length as the original (but the tuple header is still WAL > + * logged and will still be restored at that point). If and when the > + * in-place update record corresponding to releasing a value lock is > + * replayed, crash recovery takes the final tuple value from there. > + * Thus, speculative heap records require two WAL records. > + * > + * Logical decoding interprets an in-place update associated with a > + * speculative insertion as a regular insert change. In other words, > + * the in-place record generated affirms that a speculative insertion > + * completed successfully. > + */ > + heap_inplace_update(relation, tuple); > + } > + else > + { That's a bizarre solution. May I suggest a much simpler one: Make the heap-insert record the same for normal and speculative insertions, except for a flag that's set if it's a speculative one. Replay as usual. When the speculative insertion is finished, write a new kind of a WAL record for that. The record only needs to contain the ctid of the tuple. Replaying that record will clear the flag on the heap tuple that said that it was a speculative insertion. In logical decoding, decode speculative insertions like any other insertion. To decode a super-deletion record, scan the reorder buffer for the transaction to find the corresponding speculative insertion record for the tuple, and remove it. BTW, that'd work just as well without the new WAL record to finish a speculative insertion. Am I missing something? > --- a/src/include/storage/off.h > +++ b/src/include/storage/off.h > @@ -26,6 +26,7 @@ typedef uint16 OffsetNumber; > #define InvalidOffsetNumber ((OffsetNumber) 0) > #define FirstOffsetNumber ((OffsetNumber) 1) > #define MaxOffsetNumber ((OffsetNumber) (BLCKSZ / sizeof(ItemIdData))) > +#define MagicOffsetNumber (MaxOffsetNumber + 1) > #define OffsetNumberMask (0xffff) /* valid uint16 bits */ IMHO it would be nicer if the magic value was more constant, e.g. 0xffff or 0xfffe, instead of basing it on MaxOffsetNumber which depends on block size. I would rather not include MaxOffsetNumber of anything derived from it in the on-disk dormat. - Heikki
On 2015-04-15 17:58:54 +0300, Heikki Linnakangas wrote: > On 04/15/2015 07:51 AM, Peter Geoghegan wrote: > >+heap_finish_speculative(Relation relation, HeapTuple tuple, bool conflict) > >+{ > >+ if (!conflict) > >+ { > >+ /* > >+ * Update the tuple in-place, in the common case where no conflict was > >+ * detected during speculative insertion. > >+ * > >+ * When heap_insert is called in respect of a speculative tuple, the > >+ * page will actually have a tuple inserted. However, during recovery > >+ * replay will add an all-zero tuple to the page instead, which is the > >+ * same length as the original (but the tuple header is still WAL > >+ * logged and will still be restored at that point). If and when the > >+ * in-place update record corresponding to releasing a value lock is > >+ * replayed, crash recovery takes the final tuple value from there. > >+ * Thus, speculative heap records require two WAL records. > >+ * > >+ * Logical decoding interprets an in-place update associated with a > >+ * speculative insertion as a regular insert change. In other words, > >+ * the in-place record generated affirms that a speculative insertion > >+ * completed successfully. > >+ */ > >+ heap_inplace_update(relation, tuple); > >+ } > >+ else > >+ { > > That's a bizarre solution. I tend to agree, but for different reasons. > In logical decoding, decode speculative insertions like any other insertion. > To decode a super-deletion record, scan the reorder buffer for the > transaction to find the corresponding speculative insertion record for the > tuple, and remove it. Not that easy. That buffer is spilled to disk and such. As discussed. Greetings, Andres Freund
On 04/15/2015 06:01 PM, Andres Freund wrote: > On 2015-04-15 17:58:54 +0300, Heikki Linnakangas wrote: >> On 04/15/2015 07:51 AM, Peter Geoghegan wrote: >>> +heap_finish_speculative(Relation relation, HeapTuple tuple, bool conflict) >>> +{ >>> + if (!conflict) >>> + { >>> + /* >>> + * Update the tuple in-place, in the common case where no conflict was >>> + * detected during speculative insertion. >>> + * >>> + * When heap_insert is called in respect of a speculative tuple, the >>> + * page will actually have a tuple inserted. However, during recovery >>> + * replay will add an all-zero tuple to the page instead, which is the >>> + * same length as the original (but the tuple header is still WAL >>> + * logged and will still be restored at that point). If and when the >>> + * in-place update record corresponding to releasing a value lock is >>> + * replayed, crash recovery takes the final tuple value from there. >>> + * Thus, speculative heap records require two WAL records. >>> + * >>> + * Logical decoding interprets an in-place update associated with a >>> + * speculative insertion as a regular insert change. In other words, >>> + * the in-place record generated affirms that a speculative insertion >>> + * completed successfully. >>> + */ >>> + heap_inplace_update(relation, tuple); >>> + } >>> + else >>> + { >> >> That's a bizarre solution. > > I tend to agree, but for different reasons. > >> In logical decoding, decode speculative insertions like any other insertion. >> To decode a super-deletion record, scan the reorder buffer for the >> transaction to find the corresponding speculative insertion record for the >> tuple, and remove it. > > Not that easy. That buffer is spilled to disk and such. As discussed. Hmm, ok, I've read the "INSERT ... ON CONFLICT UPDATE and logical decoding" thread now, and I have to say that IMHO it's a lot more sane to handle this in ReorderBufferCommit() like Peter first did, than to make the main insertion path more complex like this. Another idea is to never spill the latest record to disk, at least if it was a speculative insertion. Then you would be sure that when you see the super-deletion record, the speculative insertion it refers to is still in memory. That seems simple. - Heikki
On 2015-04-15 18:53:15 +0300, Heikki Linnakangas wrote: > Hmm, ok, I've read the "INSERT ... ON CONFLICT UPDATE and logical decoding" > thread now, and I have to say that IMHO it's a lot more sane to handle this > in ReorderBufferCommit() like Peter first did, than to make the main > insertion path more complex like this. I don't like Peter's way much. For one it's just broken. For another it's quite annoying to trigger disk reads to figure out what actual type of record something is. If we go that way that's the way I think it should be done: Whenever we encounter a speculative record we 'unlink' it from the changes that will be reused for spooling from disk and do nothing further. Then we just continue reading through the records. If the next thing we encounter is a super-deletion we throw away that record. If it's another type of change (or even bettter a 'speculative insertion succeeded' record) insert it. That'll still require some uglyness, but it should not be too bad. I earlier thought that'd not be ok because there could be a new catalog snapshot inbetween, but I was mistaken: The locking on the source transaction prevents problems. > Another idea is to never spill the latest record to disk, at least if it was > a speculative insertion. Then you would be sure that when you see the > super-deletion record, the speculative insertion it refers to is still in > memory. That seems simple. It's not guaranteed to be the last record, there can be records inbetween (snapshots from other backends at the very least). Greetings, Andres Freund
On 2015-04-15 17:58:54 +0300, Heikki Linnakangas wrote: > When the speculative insertion is finished, write a new kind of a WAL record > for that. The record only needs to contain the ctid of the tuple. Replaying > that record will clear the flag on the heap tuple that said that it was a > speculative insertion. > > In logical decoding, decode speculative insertions like any other insertion. > To decode a super-deletion record, scan the reorder buffer for the > transaction to find the corresponding speculative insertion record for the > tuple, and remove it. > > BTW, that'd work just as well without the new WAL record to finish a > speculative insertion. Am I missing something? I'm, completely independent of logical decoding, of the *VERY* strong opinion that 'speculative insertions' should never be visible when looking with normal snapshots. For one it allows to simplify considerations around wraparound (which has proven to be a very good idea, c.f. multixacts + vacuum causing data corruption years after it was thought to be harmless). For another it allows to reclaim/redefine the bit after a database restart/upgrade. Given how complex this is and how scarce flags are that seems like a really good property. And avoiding those flags to be visible to the outside requires a WAL record, otherwise it won't be correct on the standby. Andres
On 04/16/2015 12:18 PM, Andres Freund wrote: > On 2015-04-15 18:53:15 +0300, Heikki Linnakangas wrote: >> Hmm, ok, I've read the "INSERT ... ON CONFLICT UPDATE and logical decoding" >> thread now, and I have to say that IMHO it's a lot more sane to handle this >> in ReorderBufferCommit() like Peter first did, than to make the main >> insertion path more complex like this. > > I don't like Peter's way much. For one it's just broken. For another > it's quite annoying to trigger disk reads to figure out what actual type > of record something is. > > If we go that way that's the way I think it should be done: Whenever we > encounter a speculative record we 'unlink' it from the changes that will > be reused for spooling from disk and do nothing further. Then we just > continue reading through the records. If the next thing we encounter is > a super-deletion we throw away that record. If it's another type of > change (or even bettter a 'speculative insertion succeeded' record) > insert it. That'll still require some uglyness, but it should not be too > bad. Sounds good to me. - Heikki
On Thu, Apr 16, 2015 at 2:23 AM, Andres Freund <andres@anarazel.de> wrote: > I'm, completely independent of logical decoding, of the *VERY* strong > opinion that 'speculative insertions' should never be visible when > looking with normal snapshots. For one it allows to simplify > considerations around wraparound (which has proven to be a very good > idea, c.f. multixacts + vacuum causing data corruption years after it > was thought to be harmless). For another it allows to reclaim/redefine > the bit after a database restart/upgrade. Given how complex this is and > how scarce flags are that seems like a really good property. > > And avoiding those flags to be visible to the outside requires a WAL > record, otherwise it won't be correct on the standby. I'm a bit distracted here, and not sure exactly what you mean. What's a normal snapshot? Do you just mean that you think that speculative insertions should be explicitly affirmed by a second record (making it not a speculative tuple, but rather, a fully fledged tuple)? IOW, an MVCC snapshot has no chance of seeing a tuple until it was affirmed by a second in-place modification, regardless of tuple xmin xact commit status? -- Peter Geoghegan
On Thu, Apr 16, 2015 at 2:18 AM, Andres Freund <andres@anarazel.de> wrote: > On 2015-04-15 18:53:15 +0300, Heikki Linnakangas wrote: >> Hmm, ok, I've read the "INSERT ... ON CONFLICT UPDATE and logical decoding" >> thread now, and I have to say that IMHO it's a lot more sane to handle this >> in ReorderBufferCommit() like Peter first did, than to make the main >> insertion path more complex like this. > > I don't like Peter's way much. For one it's just broken. For another > it's quite annoying to trigger disk reads to figure out what actual type > of record something is. > > If we go that way that's the way I think it should be done: Whenever we > encounter a speculative record we 'unlink' it from the changes that will > be reused for spooling from disk and do nothing further. Then we just > continue reading through the records. You mean we continue iterating through *changes* from ReorderBufferCommit()? > If the next thing we encounter is > a super-deletion we throw away that record. If it's another type of > change (or even bettter a 'speculative insertion succeeded' record) > insert it. By "insert it", I gather you mean report to the the logical decoding plugin as an insert change. > That'll still require some uglyness, but it should not be too > bad. So, to be clear, what you have in mind is sort of a hybrid between my first and second approaches (mostly my first approach). We'd have coordination between records originally decoded into changes, maybe "intercepting" them during xact reassembly, like my first approach. We'd mostly do the same thing as the first approach, actually. The major difference would be that there'd be explicit "speculative affirmation" WAL records. But we wouldn't rely on those affirmation records within ReorderBufferCommit() - we'd rely on the *absence* of a super deletion WAL record (to report an insert change to the decoding plugin). To emphasize, like my first approach, it would be based on an *absence* of a super deletion WAL record. However, like my second approach, there would be a "speculative affirmation" WAL record. The new speculative affirmation WAL record would however be quite different to what my second approach to logical decoding (the in-place update record thing that was in V3.3) actually did. In particular, it would be far more simple, and the tuple would be built from the original insertion record within logical decoding. Right now, I'm tired, so bear with me. What I think I'm not quite getting here is how the new type of "affirmation" record affects visibility (or anything else, actually). Clearly dirty snapshots need to see the record to set a speculative insertion token for their caller to wait on (and HeapTupleSatisfiesVacuum() cannot see the speculatively inserted tuple as reclaimable, of course). They need this *before* the speculative insertion is affirmed, of course. Maybe you mean: If the speculative insertion xact is in progress, then the tuple is visible to several types of snapshots (dirty, vacuum, self, any). If it is not, then tuples are not visible because they are only speculative (and not *confirmed*). If they were confirmed, and the xact was committed, then those tuples are logically and physically indistinguishable from tuples that were inserted in the ordinary manner. Is that it? Basically, the affirmation records have nothing much to do with logical decoding in particular. But you still need to super delete, so that several types of snapshots ((dirty, vacuum, self, any) *stop* seeing the tuple as visible independent of the inserting xact being in progress. > I earlier thought that'd not be ok because there could be a new catalog > snapshot inbetween, but I was mistaken: The locking on the source > transaction prevents problems. I thought this was the case. -- Peter Geoghegan
On 9 April 2015 at 22:18, Peter Geoghegan <pg@heroku.com> wrote: > The big idea (the fine details of which Stephen appeared to be in > tentative agreement with [1]) is that an UPSERT is a hybrid between an > INSERT and an UPDATE, and not simply an INSERT and separate UPDATE > tied together. So at the very least the exact behavior of such a > hybrid cannot be assumed to be any particular way just from > generalizing from known behaviors for INSERT and UPDATE (which is a > usability issue, since the fine details of RLS are already complex > enough without UPSERT). > I think that you're over-complicating this. From a usability point of view, I think that the best approach is to keep this as simple as possible and make the behaviour consistent with an INSERT and an UPDATE tied together, as is suggested by the new syntax. The key point is that, if you are using the RLS INSERT and UPDATE policies for this new command, the rule should be that the user has permission to insert/update a new/existing row if and only if they would have had permission to do so using a stand-alone INSERT/UPDATE command. On the other hand, if you believe that the behaviour should be something other than that, then I think that it would need a new dedicated kind of RLS policy for this command because, as I will attempt to explain below, merging together the quals from INSERT and UPDATE policies leads to logical problems. > The INSERT policies are only enforced when a tuple is inserted > because, when the alternative path isn't taken then it's really just > an INSERT. > Agreed. > For the UPDATE path, where the stickiness/hybridness begins... > <snip> > I'd appreciate it if you explicitly outlined what policies you feel > should be enforce at each of the 3 junctures within an UPSERT (post > INSERT, pre-UPDATE, post-UPDATE). I would also like you to be very > explicit about whether or not RLS WITH CHECK policies and USING quals > (presumably enforced as RLS WITH CHECK policies) from both INSERT and > UPDATE policies should be enforced at each point. In particular, > should I ever not treat RLS WCO and USING quals equivalently? (recall > that we don't want to elide an UPDATE silently, which makes much less > sense for UPSERT...I had assumed that whatever else, we'd always treat > WCO and USING quals from UPDATE (and ALL) policies equivalently, but > perhaps not). OK, I'll try to explicitly outline how I think this ought to work: For INSERTs and UPDATEs, there are 3 kinds of RLS quals that come into play: 1). insertCheckQuals - the logical OR of the quals from all INSERT WITH CHECK policy clauses. These give the user permission to insert into the table, provided that the new rows satisfy at least one of these quals. 2). updateUsingQuals - the logical OR of the quals from all UPDATE USING policy clauses. These give the user permission to update existing rows in the table, where the existing rows satisfy at least one of these quals. 3). updateCheckQuals - the logical OR of the quals from all UPDATE WITH CHECK policy clauses. These give the user permission to update the table, provided that the new rows satisfy at least one of these quals. In general these may all differ from one another. If we go forward with the idea that RLS quals should be checked before attempting to insert/update any data, as we do for regular permission checks, then stand-alone INSERT and UPDATE commands would work conceptually as follows: INSERT: 1. Check NEW against insertCheckQuals (error if the result is not true) 2. Do the actual insert of NEW UPDATE: 1. Check OLD against updateUsingQuals (skip rows that don't match) 2. Check NEW against updateCheckQuals (error ifthe result is not true) 3. Do the actual update (OLD -> NEW) Of course that's an over-simplification. The updateUsingQuals are actually merged with the user-supplied WHERE clause in a way dependent on the presence of leaky functions, but conceptually it matches the above description. I think that there is universal agreement that an INSERT .. ON CONFLICT UPDATE that follows the insert path ought to match the pure INSERT case, and only check the insertCheckQuals. That just leaves the question of what to do on the update path, where things get more complicated because there are 3 tuples involved in the process: 1). t1 - the tuple originally proposed for insertion, but which could not be inserted due to a conflict with an existing row in the table (aka EXCLUDED). 2). t2 - the existing row in the table that prevented t1 from being inserted (aka TARGET). 3). t3 - the final new row resulting from the update path. In general, we allow this to be quite different from t1 and t2. If we think of INSERT .. ON CONFLICT UPDATE as an INSERT and an UPDATE tied together, then logically the following would happen if the update path were taken: INSERT .. ON CONFLICT UPDATE (update path): 1. Check t1 against insertCheckQuals (error if not true) 2. Detect that t1 conflictswith t2 3. Test user-supplied auxiliary WHERE clause (skip if not true) 4. Check t2 against updateUsingQuals (errorif not true) 5. Check t3 against updateCheckQuals (error if not true) 6. Do the actual update (t2 -> t3) Step (4) is the only point where this differs from an INSERT and an UPDATE tied together, and only in the fact that it throws an error rather than skipping the row to be updated if the user doesn't have permission to do so. The important point is that this is consistent with the rule that it gives the user permission to insert/update a new/existing row if and only if they would have been allowed to do so using a stand-alone INSERT/UPDATE command. Note that there are 3 participating tuples, and each is checked against precisely one of the 3 relevant policies. That is important, as I will attempt to explain. As it stands, your patch classifies WCOs by command type and combines them in a way that results in 4 additional RLS checks being performed (not really in this order): 5.1. Check t2 against insertCheckQuals (error if not true) 5.2. Check t2 against updateCheckQuals (error if not true) 5.3.Check t3 against insertCheckQuals (error if not true) 5.4. Check t3 against updateUsingQuals (error if not true) Some of those additional checks really don't make any sense, e.g., (5.2) prevents you from updating an existing tuple if you wouldn't have had permission to make an update that resulted in that tuple, which just seems completely backwards. But the real problem is that by applying multiple checks to the same tuple, you're implicitly ANDing together the quals from different RLS policy types and the problem with that is that it's possible for the quals from different policy types to be completely incompatible (their logical AND may be identically equivalent to false). So by doing these additional checks it may be impossible for any tuple to ever satisfy all the checks at the same time, making the command unusable. So the general rule, as I alluded to above, should be that no 2 different kinds of RLS quals should ever be applied to the same tuple. The nub of the problem is that you're classifying WCOs by command type, which is wrong because updateUsingQuals have different semantics from updateCheckQuals, even though they both originate from the UPDATE policy. If you factor in updatable SB views, which we might hope to one day have support for INSERT .. ON CONFLICT UPDATE, there will be additional WCOs arising from the view, which will have different semantics again (the SQL spec says that they should apply after the insert/update to ensure that the final result is visible in the view). These are then another separate kind of WCO (that doesn't depend on the command type). I think this will actually be quite straightforward, as long as all the different kinds of WCO aren't lumped together. If you take a look at my patch to apply RLS checks before insert/update, you'll see that I've added a WCOKind enum that allows each of these kinds of WCOs to be treated differently and checked at different stages in the executor. I anticipated that INSERT .. ON CONFLICT UPDATE would extend that by adding a new kind of WCO for updateUsingQuals checked as if they were WCOs, allowing them to be checked at a specific point in the new code (step (4) above). In all of this, I think we should try to keep things as simple as possible, to give the end user a chance to understand it --- although I'm not sure I've achieved that with my explanation :-) Regards, Dean
On 2015-04-16 10:25:29 -0700, Peter Geoghegan wrote: > On Thu, Apr 16, 2015 at 2:18 AM, Andres Freund <andres@anarazel.de> wrote: > > If we go that way that's the way I think it should be done: Whenever we > > encounter a speculative record we 'unlink' it from the changes that will > > be reused for spooling from disk and do nothing further. Then we just > > continue reading through the records. > > You mean we continue iterating through *changes* from ReorderBufferCommit()? changes, records, not much of a difference here. > > If the next thing we encounter is > > a super-deletion we throw away that record. If it's another type of > > change (or even bettter a 'speculative insertion succeeded' record) > > insert it. > > By "insert it", I gather you mean report to the the logical decoding > plugin as an insert change. Yes. > We'd have coordination between records originally decoded into > changes, maybe "intercepting" them during xact reassembly, like my > first approach. We'd mostly do the same thing as the first approach, > actually. The major difference would be that there'd be explicit > "speculative affirmation" WAL records. But we wouldn't rely on those > affirmation records within ReorderBufferCommit() - we'd rely on the > *absence* of a super deletion WAL record (to report an insert change > to the decoding plugin). To emphasize, like my first approach, it > would be based on an *absence* of a super deletion WAL record. > However, like my second approach, there would be a "speculative > affirmation" WAL record. I think there should be one, but it's not required for the approach. The 'pending speculative insertion' can just be completed whenever there's a insert/update/delete that's not a super deletion. I.e. in the REORDER_BUFFER_CHANGE_INSERT/... case ReorderBufferCommit() just add something like: if (pending_insertion != NULL) { if (new_record_is_super_deletion) ReorderBufferReturnTupleBuf(pending_insertion); else rb->apply_change(rb,txn, relation, pending_insertion); } ... You'll have to be careful to store the pending_insertion *after* ReorderBufferToastReplace(), but that should not be a problem. > Right now, I'm tired, so bear with me. What I think I'm not quite > getting here is how the new type of "affirmation" record affects > visibility (or anything else, actually). Clearly dirty snapshots need > to see the record to set a speculative insertion token for their > caller to wait on (and HeapTupleSatisfiesVacuum() cannot see the > speculatively inserted tuple as reclaimable, of course). They need > this *before* the speculative insertion is affirmed, of course. > > Maybe you mean: If the speculative insertion xact is in progress, then > the tuple is visible to several types of snapshots (dirty, vacuum, > self, any). If it is not, then tuples are not visible because they are > only speculative (and not *confirmed*). If they were confirmed, and > the xact was committed, then those tuples are logically and physically > indistinguishable from tuples that were inserted in the ordinary > manner. > > Is that it? Basically, the affirmation records have nothing much to do > with logical decoding in particular. But you still need to super > delete, so that several types of snapshots ((dirty, vacuum, self, any) > *stop* seeing the tuple as visible independent of the inserting xact > being in progress. I have no idea what this has to do with the email you responded to? Maybe it's more meant as a response to my separate email that I want the HEAP_SPECULATIVE to be cleared? Greetings, Andres Freund
On 2015-04-16 09:43:54 -0700, Peter Geoghegan wrote: > On Thu, Apr 16, 2015 at 2:23 AM, Andres Freund <andres@anarazel.de> wrote: > > I'm, completely independent of logical decoding, of the *VERY* strong > > opinion that 'speculative insertions' should never be visible when > > looking with normal snapshots. For one it allows to simplify > > considerations around wraparound (which has proven to be a very good > > idea, c.f. multixacts + vacuum causing data corruption years after it > > was thought to be harmless). For another it allows to reclaim/redefine > > the bit after a database restart/upgrade. Given how complex this is and > > how scarce flags are that seems like a really good property. > > > > And avoiding those flags to be visible to the outside requires a WAL > > record, otherwise it won't be correct on the standby. > > I'm a bit distracted here, and not sure exactly what you mean. What's > a normal snapshot? Normal visibility semantics, i.e. SnapshotMVCC, not SnapshotDirty. > Do you just mean that you think that speculative insertions should be > explicitly affirmed by a second record (making it not a speculative > tuple, but rather, a fully fledged tuple)? IOW, an MVCC snapshot has > no chance of seeing a tuple until it was affirmed by a second in-place > modification, regardless of tuple xmin xact commit status? Yes. I think a) HEAP_SPECULATIVE should never be visible outside in a committed transaction. That allows us to redefine what exactly the bit means and such after a simple restart. On IM Heiki said he wants to replace this by a bit in the item pointer, but I don't think that changes things substantially. b) t_ctid should not contain a speculative token in committed (i.e. visible to other sessions using mvcc semantics) tuple. Right now (i.e. master) t_ctid will point to itself for non-updated tuples. I don't think it's good to have something in there that's not an actual ctid in there. The number of places that look into t_ctid for in-progress insertions of other sessions is smaller than the ones that look at tuples in general. c) Cleaning the flag/ctid after a completed speculative insertion makes it far less likely that we end up waiting on a other backend when we wouldn't have to. If a session inserts a large number of tuples speculatively (surely *not* a unlikely thing in the long run) it gets rather likely that tokens are reused. Which means if another backend touches these in-progress records it's quite possible that the currently acquired token is the same as the one on a tuple that has actually finished inserting. Greetings, Andres Freund
Dean, * Dean Rasheed (dean.a.rasheed@gmail.com) wrote: > In all of this, I think we should try to keep things as simple as > possible, to give the end user a chance to understand it --- although > I'm not sure I've achieved that with my explanation :-) Thanks a lot for this. It goes along with my thinking also and matches, I believe, what I had explained to Peter on our call. Peter, please let me know if you agree. Dean, I've been working through your patches over the past couple of days (apologies for the lack of updates, just been busy) and hope to push them very shortly (ie: by the end of the weekend). One thing that I was hoping to discuss a bit is that I've gone ahead and added another set of hooks, so we can have both "permissive" and "restrictive" policies be provided from the hook. It's a bit late to make the grammar and other changes which would be required to add a "restrictive" policy option to the built-in RLS, but adding the hooks is relatively low-impact. I'm also going to be including a test_rls_hooks module into src/test/modules which will test those hooks and provide an example of how to use them. As for the discussion- there was some concern raised about extensions being able to "override" built-in policies by using the hooks a certain way. I don't entirely follow the logic behind that concern as an extension has the ability to read the files on disk directly from C code, should it be written to do so, and so not providing a hook to add "permissive" policies is denying useful functionality for very question gain, in my view at least. Thoughts? Thanks! Stephen
On Fri, Apr 17, 2015 at 4:54 AM, Stephen Frost <sfrost@snowman.net> wrote: > Dean, I've been working through your patches over the past couple of > days (apologies for the lack of updates, just been busy) and hope to > push them very shortly (ie: by the end of the weekend). > > One thing that I was hoping to discuss a bit is that I've gone ahead and > added another set of hooks, so we can have both "permissive" and > "restrictive" policies be provided from the hook. It's a bit late to > make the grammar and other changes which would be required to add a > "restrictive" policy option to the built-in RLS, but adding the hooks is > relatively low-impact. I came up with something that is along the lines of what we discussed. I'll wait for you to push Dean's code, and rebase on top of that before submitting what I came up with. Maybe this can be rolled into my next revision if I work through Andres' most recent feedback without much delay. -- Peter Geoghegan
On Fri, Apr 17, 2015 at 1:38 AM, Andres Freund <andres@anarazel.de> wrote: >> Do you just mean that you think that speculative insertions should be >> explicitly affirmed by a second record (making it not a speculative >> tuple, but rather, a fully fledged tuple)? IOW, an MVCC snapshot has >> no chance of seeing a tuple until it was affirmed by a second in-place >> modification, regardless of tuple xmin xact commit status? > > Yes. I think Good. > a) HEAP_SPECULATIVE should never be visible outside in a > committed transaction. That allows us to redefine what exactly the bit > means and such after a simple restart. On IM Heiki said he wants to > replace this by a bit in the item pointer, but I don't think that > changes things substantially. I guess you envisage that HEAP_SPECULATIVE is an infomask2 bit? I haven't been using one of those for a couple of revisions now, preferring to use the offset MagicOffsetNumber to indicate a speculative t_ctid value. I'm slightly surprised that Heikki now wants to use an infomask2 bit (if that is what you mean), because he wanted to preserve those by doing the MagicOffsetNumber thing. But I guess we'd still be preserving the bit under this scheme (since it's always okay to do something different with the bit after a restart). Why is it useful to consume an infomask2 bit after all? Why did Heikki change his mind - due to wanting representational redundancy? Or do I have it all wrong? > b) t_ctid should not contain a speculative token in committed > (i.e. visible to other sessions using mvcc semantics) tuple. Right now > (i.e. master) t_ctid will point to itself for non-updated tuples. I > don't think it's good to have something in there that's not an actual > ctid in there. The number of places that look into t_ctid for > in-progress insertions of other sessions is smaller than the ones that > look at tuples in general. Right. So if a tuple is committed, it should not have set HEAP_SPECULATIVE/have a t_ctid offset of MagicOffsetNumber. But a non-ctid t_ctid (a speculative token) remains possible for non-committed tuples visible to some types of snapshots (in particular, dirty snapshots). > c) Cleaning the flag/ctid after a completed speculative insertion makes > it far less likely that we end up waiting on a other backend when we > wouldn't have to. If a session inserts a large number of tuples > speculatively (surely *not* a unlikely thing in the long run) it gets > rather likely that tokens are reused. Which means if another backend > touches these in-progress records it's quite possible that the currently > acquired token is the same as the one on a tuple that has actually > finished inserting. It's more important than that, actually. If the inserter fails to clear its tuple's speculative token, and then releases their token lmgr lock, it will cause livelock with many upserting sessions. Coordinating which other session gets to lazily clear the speculative token (cleaning up after the original inserter) seemed quite hazardous when I looked into it. Maybe you could fix it by interleaving buffer locks and lmgr locks, but we can't do that. -- Peter Geoghegan
On 04/17/2015 09:02 PM, Peter Geoghegan wrote: > I'm slightly surprised that Heikki now wants > to use an infomask2 bit (if that is what you mean), No, I don't. > because he wanted to preserve those by doing the MagicOffsetNumber > thing. Yes, that's the way to go. Glad we cleared that up :-). - Heikki
On Fri, Apr 17, 2015 at 11:19 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> because he wanted to preserve those by doing the MagicOffsetNumber >> thing. > > > Yes, that's the way to go. > > Glad we cleared that up :-). Got it, thanks! -- Peter Geoghegan
On Fri, Apr 17, 2015 at 1:30 AM, Andres Freund <andres@anarazel.de> wrote: >> However, like my second approach, there would be a "speculative >> affirmation" WAL record. > > I think there should be one, but it's not required for the approach. The > 'pending speculative insertion' can just be completed whenever there's a > insert/update/delete that's not a super deletion. > > I.e. in the REORDER_BUFFER_CHANGE_INSERT/... case ReorderBufferCommit() > just add something like: > > if (pending_insertion != NULL) > { > if (new_record_is_super_deletion) > ReorderBufferReturnTupleBuf(pending_insertion); > else > rb->apply_change(rb, txn, relation, pending_insertion); > } > ... > > You'll have to be careful to store the pending_insertion *after* > ReorderBufferToastReplace(), but that should not be a problem. Okay. It seems like what you're saying is that I should be prepared to have to deal with a REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT change (or multiple such changes) from within ReorderBufferCommit() between a speculative insertion and a super deletion, but that I can safely assume that once some other insert/update/delete is encountered (or once all changes have been drained from the reorder buffer), I can then apply the speculative insertion as a regular insertion. Is that what you're saying, in a nutshell? IOW, when you said this: """ I earlier thought that'd not be ok because there could be a new catalog snapshot inbetween, but I was mistaken: The locking on the source transaction prevents problems. """ What you meant was that you'd decided that this pattern is not broken, *provided* that the implementation simply account for the fact that a REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT change could come before some other (non-REORDER_BUFFER_CHANGE_INTERNAL_DELETE, A.K.A. non-superdelete) change came? And that we might need to be more "patient" about deciding that a speculative insertion succeeded (more "patient" than considering only one single non-superdelete change, that can be anything else)? -- Peter Geoghegan
On 17 April 2015 at 12:54, Stephen Frost <sfrost@snowman.net> wrote: > Dean, I've been working through your patches over the past couple of > days (apologies for the lack of updates, just been busy) and hope to > push them very shortly (ie: by the end of the weekend). > Cool. Thanks. > One thing that I was hoping to discuss a bit is that I've gone ahead and > added another set of hooks, so we can have both "permissive" and > "restrictive" policies be provided from the hook. It's a bit late to > make the grammar and other changes which would be required to add a > "restrictive" policy option to the built-in RLS, but adding the hooks is > relatively low-impact. > Sounds interesting. Perhaps that discussion should be moved to a new thread. > I'm also going to be including a test_rls_hooks module into > src/test/modules which will test those hooks and provide an example of > how to use them. > Good idea. I had been thinking that it would be good to test RLS hooks. > As for the discussion- there was some concern raised about extensions > being able to "override" built-in policies by using the hooks a certain > way. I don't entirely follow the logic behind that concern as an > extension has the ability to read the files on disk directly from C > code, should it be written to do so, and so not providing a hook to add > "permissive" policies is denying useful functionality for very question > gain, in my view at least. > > Thoughts? > Yeah, perhaps that concern is somewhat overblown and shouldn't stand in the way of allowing a hook to add permissive policies. Regards, Dean
On Fri, Apr 17, 2015 at 1:04 PM, Peter Geoghegan <pg@heroku.com> wrote: > What you meant was that you'd decided that this pattern is not broken, > *provided* that the implementation simply account for the fact that a > REORDER_BUFFER_CHANGE_INTERNAL_SNAPSHOT change could come before some > other (non-REORDER_BUFFER_CHANGE_INTERNAL_DELETE, A.K.A. > non-superdelete) change came? And that we might need to be more > "patient" about deciding that a speculative insertion succeeded (more > "patient" than considering only one single non-superdelete change, > that can be anything else)? Attached patch, V3.4, implements what I believe you and Heikki have in mind here. There is a minimal, dedicated "affirmation" WAL record now. Importantly, it is not possible for committed tuples to be speculative (i.e. to have a magic value in their t_ctid field that indicates being a speculative tuple - Andres felt very strongly that we ought to be able to assume that committed tuples are not speculative). There are a bunch of new assertions to make sure that this actually holds now, all within tqual.c. I found Dean's recent argument [1] for mostly following the existing RLS behaviors convincing. I'm now I'm tracking what Dean called insertCheckQuals, updateUsingQuals and updateCheckQuals separately within the executor, while enforcing each (including updateUsingQuals) in the manner of WCOs (i.e. there are no silent failures when updateUsingQuals does not pass on a TARGET.* tuple, just as before - there is a WCO-style error thrown instead). And, as Dean and Stephen recently suggested, there is one and only one tuple (per ExecInsert() call) involved in enforcement for each of these three quals (or these three OR'd list of quals). These three tuples are the post-insert, the pre-update, and the post-update tuples, for the insertCheckQuals, updateUsingQuals and updateCheckQuals respectively. The new implementation has extensive revised tests - the only sane way to write something like ON CONFLICT UPDATE's RLS support is using test-driven development, IMV. There is one intentional difference between what I've done here, and what I believe Dean wants: I don't bother checking the user-supplied quals under any circumstances (so I don't "Test user-supplied auxiliary WHERE clause (skip if not true)", as described in [1]). I think we should *always* throw an error, and not silently skip the UPDATE just because the user supplied quals also limits the UPDATE. I don't want to introduce a complicated further distinction like that, because it seems like a distinction without a difference. Adding this behavior will not make something work that would not otherwise work - it will make a failure to UPDATE silent, and nothing more, AFAICT. That's much more subtle and much less necessary in the context of an auxiliary UPDATE (compared to a regular UPDATE). Other changes: * Changed magic offset, per Heikki's request. * RLS documentation updated, in line with new ON CONFLICT UPDATE behavior (this made it simpler). * Moved a bit of code from the IGNORE commit to the ON CONFLICT UPDATE commit (this concerns new possibility of heap_lock_tuple() HeapTupleInvisible return code introduced by ON CONFLICT UPDATE). It clearly belongs there. * RLS error messages advertise what type of command the violation originated from, and if the WCO originated as a USING security barrier qual (i.e. whether or not the target tuple that necessitates taking the UPDATE path was where the violation occurred, or whether it occurred on the post-update tuple intended to be appended to the relation). ON CONFLICT UPDATE makes this distinction important, since it may not be obvious which policy was violated (maybe this should go as far as naming the policy directly - I'm waiting for Stephen to push Dean's work, actually, because it will probably bitrot what I have here). These additional diagnostics were very helpful when writing those new RLS ON CONFLICT UPDATE tests, and will probably be helpful generally. Thoughts? [1] http://www.postgresql.org/message-id/CAEZATCVDdYRFbF4NMXTF-NKYibbR2VSfNXRWPyyByaCpV1jwEw@mail.gmail.com -- Peter Geoghegan
Attachment
On Sun, Apr 19, 2015 at 9:37 PM, Peter Geoghegan <pg@heroku.com> wrote: > Attached patch, V3.4, implements what I believe you and Heikki have in > mind here. Any plans to look at this, Svenne? You are signed up as a reviewer on the commitfest app. A review of the documentation, and interactions with other features like inheritance, updatable views and postgres_fdw would be useful at this point. Obviously I've gone to a lot of effort to document how things fit together at a high level on the UPSERT wiki page, but these aspects have not been commented on all that much. Thanks -- Peter Geoghegan
On 2015-04-19 21:37:51 -0700, Peter Geoghegan wrote: > Attached patch, V3.4, implements what I believe you and Heikki have in > mind here. I'm not 100% sure Heikki and I am on exactly the same page here :P I'm looking at git diff $(git merge-base upstream/master HEAD).. where HEAD is e1a5822d164db0. * The logical stuff looks much saner. * Please add tests for the logical decoding stuff. Probably both a plain regression and and isolationtester test in contrib/test_decoding.Including one that does spooling to disk. * I don't like REORDER_BUFFER_CHANGE_INTERNAL_INSERT/DELETE as names. Why not _SPECINSERT and _SPECDELETE or such? * Iff we're going to have the XLOG_HEAP_AFFIRM record, I'd rather have that guide the logical decoding code. Seems slightlycleaner. * Still not a fan of the name 'arbiter' for the OC indexes. * Gram.y needs a bit more discussion: * Can anybody see a problem with changing the precedence of DISTINCT & ON to nonassoc?Right now I don't see a problem given both are reserved keywords already. The reason the conflict exists AFAICSis because something like INSERT INTO foo SELECT DISTINCT ON CONFLICT IGNORE; is allowed by the grammar. The needfor the nonassoc could be avoided by requiring DISTINCT to be followed by a column. We currently *do* enforce that,just not in the parser (c.f. transformDistinctClause). That requires one more production in simple_select, and anonoptional distinct clause. I've queued up a commit cleaning this up in my repo, feel free to merge and polish. * UpdateInsertStmtis a horrible name. OnConflictUpdateStmt maybe? * '(' index_params where_clause ')' is imo rather strange.The where clause is inside the parens? That's quite different from the original index clause. * SPEC_IGNORE, /* INSERT of "ON CONFLICT IGNORE" */ looks like a wrongly copied comment. * The indentation in RoerderBufferCommit is clearly getting out of hand, I've queued up a commit cleaning this up in my repo,feel free to merge. * I don't think we use the term 'ordinary table' in error messages so far. * I still think it's unacceptable to redefine XLOG_HEAP_LAST_MULTI_INSERT as XLOG_HEAP_SPECULATIVE_TUPLE like you did. I'lltry to find something better. * I wonder if we now couldn't avoid changing heap_delete's API - we can always super delete if we find a speculative insertionnow. It'd be nice not to break out of core callers if not necessary. * breinbaas on IRC just mentioned that it'd be nice to have upsert as a link in the insert. Given that that's the pervasiveterm that doesn't seem absurd. I think this is getting closer to a commit. Let's get this done. Greetings, Andres Freund
On 2015-04-21 16:57:45 +0200, Andres Freund wrote: > * I still think it's unacceptable to redefine > XLOG_HEAP_LAST_MULTI_INSERT as XLOG_HEAP_SPECULATIVE_TUPLE like you > did. I'll try to find something better. I think we should "just" split this into different flag values for insert/update/delete. I.e. something like /* flags for heap insert and multi insert */ #define XLH_INSERT_ALL_VISIBLE_CLEARED #define XLH_INSERT_LAST_MULTI_INSERT #define XLH_INSERT_IS_SPECULATIVE #define XLH_INSERT_CONTAINS_NEW_TUPLE /* flags for update */ #define XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED #define XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED #define XLH_UPDATE_CONTAINS_OLD_TUPLE #define XLH_UPDATE_CONTAINS_OLD_KEY #define XLH_UPDATE_CONTAINS_NEW_TUPLE #define XLH_UPDATE_PREFIX_FROM_OLD #define XLH_UPDATE_SUFFIX_FROM_OLD /* flags for delete */ #define XLH_DELETE_ALL_VISIBLE_CLEARED #define XLH_DELETE_CONTAINS_OLD_TUPLE #define XLH_DELETE_CONTAINS_OLD_KEY Greetings, Andres Freund
On Tue, Apr 21, 2015 at 7:57 AM, Andres Freund <andres@anarazel.de> wrote: > I'm not 100% sure Heikki and I am on exactly the same page here :P > > I'm looking at git diff $(git merge-base upstream/master HEAD).. where > HEAD is e1a5822d164db0. Merged your stuff into my Github branch. Heikki is pushing changes there directly now. > * The logical stuff looks much saner. Cool. > * Please add tests for the logical decoding stuff. Probably both a plain > regression and and isolationtester test in > contrib/test_decoding. Including one that does spooling to disk. Working on it...hope to push that to Github soon. > * I don't like REORDER_BUFFER_CHANGE_INTERNAL_INSERT/DELETE as names. Why not > _SPECINSERT and _SPECDELETE or such? Changed that on Github. > * Iff we're going to have the XLOG_HEAP_AFFIRM record, I'd rather have > that guide the logical decoding code. Seems slightly cleaner. I thought that you didn't think that would always work out? That in some possible scenario it could break? > * Still not a fan of the name 'arbiter' for the OC indexes. What do you prefer? Seems to describe what we're talking about reasonably well to me. > * Gram.y needs a bit more discussion: > * Can anybody see a problem with changing the precedence of DISTINCT & > ON to nonassoc? Right now I don't see a problem given both are > reserved keywords already. Why did you push code that solved this in a roundabout way, but without actually reverting my original nonassoc changes (which would now not result in shift/reduce conflicts?). What should we do about that? Seems your unsure (otherwise you'd have reverted my thing). I don't like that you seem to have regressed diagnostic output [1]. Surely it's simpler to use the nonassoc approach? I think this works by giving the relevant keywords an explicit priority lower than '(', so that a rule with ON CONFLICT '(' will shift rather than reducing a conflicting rule (CONFLICT is an unreserved keyword). I wrote the code so long ago that I can't really remember why I thought it was the right thing, though. > * UpdateInsertStmt is a horrible name. OnConflictUpdateStmt maybe? Done on Github. > * '(' index_params where_clause ')' is imo rather strange. The where > clause is inside the parens? That's quite different from the > original index clause. I don't know. Maybe I was lazy about fixing shift/reduce conflicts. :-) I'll look at this some more. > * SPEC_IGNORE, /* INSERT of "ON CONFLICT IGNORE" */ looks like > a wrongly copied comment. Not sure what you mean here. Please clarify. > * The indentation in RoerderBufferCommit is clearly getting out of hand, > I've queued up a commit cleaning this up in my repo, feel free to merge. Done on Github. > * I don't think we use the term 'ordinary table' in error messages so > far. Fixed on Github. > * I still think it's unacceptable to redefine > XLOG_HEAP_LAST_MULTI_INSERT as XLOG_HEAP_SPECULATIVE_TUPLE like you > did. I'll try to find something better. I did what you suggested in your follow-up e-mail (changes are on Github [2]). > * I wonder if we now couldn't avoid changing heap_delete's API - we can > always super delete if we find a speculative insertion now. It'd be > nice not to break out of core callers if not necessary. Maybe, but if there is a useful way to break out only a small subset of heap_delete(), I'm not seeing it. Most of the callers that need a new NULL argument are heap_insert() callers, actually. There are now 3 heap_delete() callers, up from 2. > * breinbaas on IRC just mentioned that it'd be nice to have upsert as a > link in the insert. Given that that's the pervasive term that doesn't > seem absurd. Not sure what you mean. Where would the link appear? It is kind of hard to categorize that text so that we're strictly either talking about INSERT or UPSERT. Might be possible, though. > I think this is getting closer to a commit. Let's get this done. Great! The blockers to committing the IGNORE patch I see are: * We need to tweak some of the logical decoding stuff a bit more, I feel. Firm up on the details of whether we treat a confirmation record as a logical decoding change that is involved in the new dance during transaction reassembly. * We need to sort out those issues with the grammar, since that only really applies to the inference specification. Maybe the WHERE clause that the inference specification accepts can be broken out. No ON CONFLICT UPDATE specific issues left there, AFAICT though. That really seems totally doable in just a couple of days. The blockers for committing the ON CONFLICT UPDATE patch are larger, but doable: * We need to figure out the tuple lock strength details. I think this is doable, but it is the greatest challenge to committing ON CONFLICT UPDATE at this point. Andres feels that we should require no greater lock strength than an equivalent UPDATE. I suggest we get to this after committing the IGNORE variant. We probably need to discuss this some more. * At the very least, I need to rebase my RLS patch onto what Stephen just pushed (Dean's work). It would be nice to get Stephen and/or Dean to review those aspects, as subject matter experts. * It would also be nice to get someone like Tom to take a quick look at what I'm doing in the optimizer, and in particular the rewriter. Maybe he'd feel that the normalization that I'm doing (the ExcludedExpr stuff) would be better suited to the optimizer. This doesn't have to be a blocker to commit - it's just a suggestion. Do you agree with my assessment? Did I miss a blocker? I'd like to hear what Heikki has to say here too, now that he is pushing code to Github. What needs more work before we can commit 1) ON CONFLICT IGNORE, and 2) ON CONFLICT UPDATE? Thanks [1] https://github.com/petergeoghegan/postgres/commit/75d96c23676fd91568e9ec638470250c8b5e5f1b [2] https://github.com/petergeoghegan/postgres/commit/0cfde636bf1ffca438418fa0c02043805e99f30f -- Peter Geoghegan
On Wed, Apr 22, 2015 at 3:23 PM, Peter Geoghegan <pg@heroku.com> wrote: > * We need to sort out those issues with the grammar, since that only > really applies to the inference specification. Maybe the WHERE clause > that the inference specification accepts can be broken out. No ON > CONFLICT UPDATE specific issues left there, AFAICT though. I pushed some code that deals with the predicate being within parenthesis: https://github.com/petergeoghegan/postgres/commit/358854645279523310f998dfc9cb3fe3e165ce1e (it now follows the attributes/expressions indexes, in the style of CREATE INDEX). We still need to reconcile these changes to the grammar with your own, Andres. I'm going to wait to hear back on what you think about that. Note that this removal: -%nonassoc DISTINCT -%nonassoc ON was incidental to the commit (this is the code you could have removed when you modified the grammar, adding a new production, but didn't). -- Peter Geoghegan
On 2015-04-22 15:23:16 -0700, Peter Geoghegan wrote: > On Tue, Apr 21, 2015 at 7:57 AM, Andres Freund <andres@anarazel.de> wrote: > > * Iff we're going to have the XLOG_HEAP_AFFIRM record, I'd rather have > > that guide the logical decoding code. Seems slightly cleaner. > > I thought that you didn't think that would always work out? That in > some possible scenario it could break? I don't think there's a real problem. You obviously have to do it right (i.e. only abort insertion if there's a insert/update/delete or commit). Speaking of commits: Without having rechecked: I think you're missing cleanup of th pending insertion on commit. > > * Gram.y needs a bit more discussion: > > * Can anybody see a problem with changing the precedence of DISTINCT & > > ON to nonassoc? Right now I don't see a problem given both are > > reserved keywords already. > > Why did you push code that solved this in a roundabout way, but > without actually reverting my original nonassoc changes (which would > now not result in shift/reduce conflicts?). I pushed the changes to a separate repo so you could polish them ;) > What should we do about that? I'm prety sure we should not do the %nonassoc stuff. > I don't like that you seem to have regressed > diagnostic output [1]. Meh^5. This is the same type of errors we get in literally hundreds of other syntax errors. And in contrast to the old error it'll actually have a correct error possition. > Surely it's simpler to use the nonassoc approach? I think it's much harder to forsee all consequences of that. > > * SPEC_IGNORE, /* INSERT of "ON CONFLICT IGNORE" */ looks like > > a wrongly copied comment. > > Not sure what you mean here. Please clarify. I'm not sure either ;) > > * I wonder if we now couldn't avoid changing heap_delete's API - we can > > always super delete if we find a speculative insertion now. It'd be > > nice not to break out of core callers if not necessary. > > Maybe, but if there is a useful way to break out only a small subset > of heap_delete(), I'm not seeing it. I think you misread my statement: I'm saying we don't need the new argument anymore, even if we still do the super-deletion in heap_delete(). Now that the speculative insertion will not be visible (as in seen on a tuple they could delete) to other backends we can just do the super deletion if we see that the tuple is a promise one. > > * breinbaas on IRC just mentioned that it'd be nice to have upsert as a > > link in the insert. Given that that's the pervasive term that doesn't > > seem absurd. > > Not sure what you mean. Where would the link appear? The index, i.e. it'd just be another indexterm. It seems good to give people a link. > * We need to figure out the tuple lock strength details. I think this > is doable, but it is the greatest challenge to committing ON CONFLICT > UPDATE at this point. Andres feels that we should require no greater > lock strength than an equivalent UPDATE. I suggest we get to this > after committing the IGNORE variant. We probably need to discuss this > some more. I want to see a clear way forward before we commit parts. It doesn't have to be completely iron-clad, but the way forward should be pretty clear. What's the problem you're seeing right now making this work? A couple days on jabber you seemed to see a way doing this? The reason I think this has to use the appropriate lock level is that it'll otherwise re-introduce the deadlocks that fk locks resolved. Given how much pain we endured to get fk locks, that seems like a bad deal. Greetings, Andres Freund
On 2015-04-22 16:40:07 -0700, Peter Geoghegan wrote: > On Wed, Apr 22, 2015 at 3:23 PM, Peter Geoghegan <pg@heroku.com> wrote: > > * We need to sort out those issues with the grammar, since that only > > really applies to the inference specification. Maybe the WHERE clause > > that the inference specification accepts can be broken out. No ON > > CONFLICT UPDATE specific issues left there, AFAICT though. > > I pushed some code that deals with the predicate being within parenthesis: > > https://github.com/petergeoghegan/postgres/commit/358854645279523310f998dfc9cb3fe3e165ce1e And the way you've used nonassoc here doesn't look correct. You're hiding legitimate ambiguities in the grammar. UPDATE is a unreserved keyword, so for ... ON CONFLICT '(' index_params ')' where_clause OnConflictUpdateStmt it won't be able to discern whether an UPDATE in the WHERE clause is part of the where_clause or OnConflictUpdate. This is legal: SELECT * FROM (SELECT true as update) f WHERE update; i.e. 'update' can be the last part of a WHERE clause. Essentially what you're trying to do with the nonassic is hiding that UPDATE and IGNORE need to be reserved keywords with the syntax you're proposing. We can either make them reserved or change the syntax. One way to avoid making them reserved keywords - which would be somewhat painful - is to add a 'DO' before the IGNORE/UPDATE. I.e. something like ON CONFLICT opt_conflict_expr DO OnConflictUpdateStmt | ON CONFLICT opt_conflict_expr DO IGNORE Greetings, Andres Freund
<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Apologies for butting inbut can I (as a user) express a preference as a user against DO? </div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Firstly,it looks horrible. And what's to stop me having "SELECT trueAS do" in the where clause (as per your UPDATE objection)?</div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Shouldn'tUPDATE be a reserved keyword anyway? AIUI ANSI suggests so.</div><divclass="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default"style="style"><font face="verdana, sans-serif"><a href="http://developer.mimer.se/validator/sql-reserved-words.tml">http://developer.mimer.se/validator/sql-reserved-words.tml</a></font><br /></div><divclass="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default"style="font-family:verdana,sans-serif;font-size:small">I had always assumed it was; anyone who produceda query for me that contained update in an unusual context would get slapped heavily.</div><div class="gmail_default"style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Geoff</div></div><divclass="gmail_extra"><br /><div class="gmail_quote">On23 April 2015 at 11:54, Andres Freund <span dir="ltr"><<a href="mailto:andres@anarazel.de" target="_blank">andres@anarazel.de</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px#ccc solid;padding-left:1ex">On 2015-04-22 16:40:07 -0700, Peter Geoghegan wrote:<br /> > On Wed,Apr 22, 2015 at 3:23 PM, Peter Geoghegan <<a href="mailto:pg@heroku.com">pg@heroku.com</a>> wrote:<br /> > >* We need to sort out those issues with the grammar, since that only<br /> > > really applies to the inferencespecification. Maybe the WHERE clause<br /> > > that the inference specification accepts can be broken out.No ON<br /> > > CONFLICT UPDATE specific issues left there, AFAICT though.<br /> ><br /> > I pushed somecode that deals with the predicate being within parenthesis:<br /> ><br /> > <a href="https://github.com/petergeoghegan/postgres/commit/358854645279523310f998dfc9cb3fe3e165ce1e" target="_blank">https://github.com/petergeoghegan/postgres/commit/358854645279523310f998dfc9cb3fe3e165ce1e</a><br/><br />And the way you've used nonassoc here doesn't look correct. You're<br /> hiding legitimate ambiguities in the grammar.UPDATE is a unreserved<br /> keyword, so for<br /><br /> ... ON CONFLICT '(' index_params ')' where_clause OnConflictUpdateStmt<br/><br /> it won't be able to discern whether an UPDATE in the WHERE clause is<br /> part of the where_clauseor OnConflictUpdate.<br /><br /> This is legal:<br /> SELECT * FROM (SELECT true as update) f WHERE update;<br/> i.e. 'update' can be the last part of a WHERE clause.<br /><br /> Essentially what you're trying to do withthe nonassic is hiding that<br /> UPDATE and IGNORE need to be reserved keywords with the syntax you're<br /> proposing.We can either make them reserved or change the syntax.<br /><br /> One way to avoid making them reserved keywords- which would be somewhat<br /> painful - is to add a 'DO' before the IGNORE/UPDATE. I.e. something like<br /><br/> ON CONFLICT opt_conflict_expr DO OnConflictUpdateStmt<br /> | ON CONFLICT opt_conflict_expr DO IGNORE<br /><br/> Greetings,<br /><br /> Andres Freund<br /><span class="HOEnZb"><font color="#888888"><br /><br /> --<br /> Sent viapgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br /> To makechanges to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers" target="_blank">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/></font></span></blockquote></div><br /></div>
On 23/04/15 14:34, Geoff Winkless wrote: > Apologies for butting in but can I (as a user) express a preference as a > user against DO? > > Firstly, it looks horrible. And what's to stop me having "SELECT true AS > do" in the where clause (as per your UPDATE objection)? > DO is already reserved keyword. There is also some precedence for this in CREATE RULE. But I agree that it's not ideal syntax. > Shouldn't UPDATE be a reserved keyword anyway? AIUI ANSI suggests so. > > http://developer.mimer.se/validator/sql-reserved-words.tml > > I had always assumed it was; anyone who produced a query for me that > contained update in an unusual context would get slapped heavily. Postgres currently has UPDATE as unreserved keyword and more importantly IGNORE is not keyword at all so making it a new reserved keyword is not nice at all. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On April 23, 2015 3:34:07 PM GMT+03:00, Geoff Winkless <pgsqladmin@geoff.dj> wrote: >Apologies for butting in but can I (as a user) express a preference as >a >user against DO? Sure. If you propose an alternative ;) >Firstly, it looks horrible. And what's to stop me having "SELECT true >AS >do" in the where clause (as per your UPDATE objection)? A syntax error. DO is a reserved keyword. Update is just unreserved (and thus can be used as a column label). Ignore is unreservedwith the patch and was unreserved before. We obviously can make both reserved, but of so we have to do it forreal, not by hiding the conflicts >Shouldn't UPDATE be a reserved keyword anyway? AIUI ANSI suggests so. > >http://developer.mimer.se/validator/sql-reserved-words.tml It's not one right now. And ignore isn't a keyword at all atm. (Please don't top post) Andres --- Please excuse brevity and formatting - I am writing this on my mobile phone.
On April 23, 2015 3:34:07 PM GMT+03:00, Geoff Winkless <pgsqladmin@geoff.dj> wrote:>And what's to stop me having "SELECT trueAS
>do" in the where clause (as per your UPDATE objection)?
A syntax error. DO is a reserved keyword. Update is just unreserved (and thus can be used as a column label). Ignore is unreserved with the patch and was unreserved before. We obviously can make both reserved, but of so we have to do it for real, not by hiding the conflicts
Sorry, I misunderstood: so it's not the fact that it can't be used as a column label (because it can) but the fact that it can't then be referenced within a WHERE clause without quoting
. Which is in itself utterly horrible, but that's a separate argument and I can at least now understand your point.
So I could end up with
INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE update UPDATE update=1
but I would have to do
INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE "do" UPDATE "do"=1
?
>Apologies for butting in but can I (as a user) express a preference as
>auser against DO?
Sure. If you propose an alternative ;)
Maybe I'm misreading it, but isn't index_predicate meant to be inside the brackets?
certainly states that.
It's not one right now. And ignore isn't a keyword at all atm.
As I said, it's my personal belief that anyone using keywords (of any kind) unquoted deserves what they get, but I see your point.
I think I object to the fact that you're talking about adding extra syntactic sugar to work around a parser-implementation problem, not an actual syntax problem (since UPDATE SET is unambiguous, isn't it?).
(Please don't top post)
Mea culpa. I blame google :)
FWIW "DO IGNORE" just reads disgustingly. If you do finally go down the DO route, perhaps "DO NOTHING"? :)
Geoff
On 2015-04-23 14:34:02 +0100, Geoff Winkless wrote: > > A syntax error. DO is a reserved keyword. Update is just unreserved (and > > thus can be used as a column label). Ignore is unreserved with the patch > > and was unreserved before. We obviously can make both reserved, but of so > > we have to do it for real, not by hiding the conflicts > > > > Sorry, I misunderstood: so it's not the fact that it can't be used as a > column label (because it can) but the fact that it can't then be referenced > within a WHERE clause without quoting Meh. You can use any keyword in quotes - because then they're not keywords anymore. > INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE > update UPDATE update=1 > > but I would have to do > > INSERT INTO mytable (somevalue) VALUES (999) ON CONFLICT ('myindex') WHERE > "do" UPDATE "do"=1 Yes. > Maybe I'm misreading it, but isn't index_predicate meant to be inside the > brackets? > > http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html That has changed since. And for good reason: It's pretty to have the WHERE clause inside the brackets when that's not the case for CREATE INDEX. But more importantly with multiple columns for stuff like ON CONFLICT (a, b WHERE foo) it's unclear where the WHERE is actually attached to. We have that problem with aggregates and it repeatedly caused confusion. > As I said, it's my personal belief that anyone using keywords (of any > kind) unquoted deserves what they get, but I see your point. Given that IGNORE isn't even a keyword right now (9.5 master) that policy isn't particularly meaningful anyway. > I think I object to the fact that you're talking about adding extra > syntactic sugar to work around a parser-implementation problem, not an > actual syntax problem (since UPDATE SET is unambiguous, isn't it?). I fail to see the point of such an objection. We have an LALR parser (generated by bison). That implies a certain expressiveness. You're suggesting that we change to a different kind of parser? I don't think it's necessarily unambiguous. I'm not particularly motivated to prove it though - the point is that we rely on bison to prevent ambiguities. That only works if we're listening. And not if we're silencing warnings about ambiguities over the whole grammar. Greetings, Andres Freund
> Maybe I'm misreading it, but isn't index_predicate meant to be inside the
> brackets?
>
> http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/on-conflict-docs/sql-insert.html
That has changed since.
Oh, helpful. :)
I'll shut up. I have a feeling that my objection is really with the very idea of unreserved keywords and I have a feeling that there will be rather more people shouting me down if I go off on that particular rant; meanwhile it's 20 years since I used yacc in earnest and it's too hazy to be able to argue about what it is or isn't capable of.
When I set out I was really only hoping to express a preference as a user; on balance I would really rather not have DO IGNORE, if it were possible to avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just about cope with (and means you don't need to add IGNORE as a keyword, win!), although it still mildly pains me that there's an additional unnecessary word.
But I certainly don't object enough to hold up you guys doing the actual work for my benefit (among others, obviously!).
G
On 2015-04-23 15:52:40 +0100, Geoff Winkless wrote: > When I set out I was really only hoping to express a preference as a user; > on balance I would really rather not have DO IGNORE, if it were possible to > avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just > about cope with (and means you don't need to add IGNORE as a keyword, > win!), although it still mildly pains me that there's an additional > unnecessary word. Yea, DO NOTHING is a good alternative. And I do like we're adding one keyword less (which is also good for the parser's size/performance). DO {UPDATE ... | NOTHING | LOCK} doesn't sound too bad to me (yes, LOCK doesn't exist yet, except by writing UPDATE .. WHERE false ;)). Greetings, Andres Freund
On Thu, Apr 23, 2015 at 05:02:19PM +0200, Andres Freund wrote: > On 2015-04-23 15:52:40 +0100, Geoff Winkless wrote: > > When I set out I was really only hoping to express a preference as a user; > > on balance I would really rather not have DO IGNORE, if it were possible to > > avoid, because it's really ugly, but DO UPDATE/DO NOTHING I could just > > about cope with (and means you don't need to add IGNORE as a keyword, > > win!), although it still mildly pains me that there's an additional > > unnecessary word. > > Yea, DO NOTHING is a good alternative. And I do like we're adding one > keyword less (which is also good for the parser's > size/performance). No question that DO IGNORE sounds awkward. DO NOTHING also matches CREATE RULE --- another plus. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 04/20/2015 07:37 AM, Peter Geoghegan wrote: > > if (wco->commandType == CMD_INSERT) > command = "INSERT-applicable "; > else if (wco->commandType == CMD_UPDATE) > command = "UPDATE-applicable "; > else if (wco->commandType == CMD_DELETE) > command = "DELETE-applicable "; > else if (wco->commandType == CMD_SELECT) > command = "SELECT-applicable "; > > ereport(ERROR, > (errcode(ERRCODE_WITH_CHECK_OPTION_VIOLATION), > errmsg("new row violates %sWITH CHECK OPTION %sfor \"%s\"", > command ? command : "", > wco->secBarrier ? "(originally security barrier) ":"", > wco->viewname), > val_desc ? errdetail("Failing row contains %s.", val_desc) : > 0)); That code in ExecWithCheckOptions is not translatable. See style guide: http://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELINES - Heikki
On Thu, Apr 23, 2015 at 9:02 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > That code in ExecWithCheckOptions is not translatable. See style guide: > http://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELINES It's probably going to need to change when I rebase on top of Dean's/Stephen's work, anyway. -- Peter Geoghegan
On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund <andres@anarazel.de> wrote: > I think you misread my statement: I'm saying we don't need the new > argument anymore, even if we still do the super-deletion in > heap_delete(). Now that the speculative insertion will not be visible > (as in seen on a tuple they could delete) to other backends we can just > do the super deletion if we see that the tuple is a promise one. I disagree. I think it's appropriate that the one and only "super" heap_delete() caller make a point of indicating that that's what it's doing. The cost is only that the two other such callers must say that they're not doing it. Super deletion is a special thing, that logical decoding knows all about for example, and I think it's appropriate that callers ask explicitly. >> > * breinbaas on IRC just mentioned that it'd be nice to have upsert as a >> > link in the insert. Given that that's the pervasive term that doesn't >> > seem absurd. >> >> Not sure what you mean. Where would the link appear? > > The index, i.e. it'd just be another indexterm. It seems good to give > people a link. Oh, okay. Sure. Done on Github. >> * We need to figure out the tuple lock strength details. I think this >> is doable, but it is the greatest challenge to committing ON CONFLICT >> UPDATE at this point. Andres feels that we should require no greater >> lock strength than an equivalent UPDATE. I suggest we get to this >> after committing the IGNORE variant. We probably need to discuss this >> some more. > > I want to see a clear way forward before we commit parts. It doesn't > have to be completely iron-clad, but the way forward should be pretty > clear. What's the problem you're seeing right now making this work? A > couple days on jabber you seemed to see a way doing this? I was really just identifying it as the biggest problem the patch now faces, and I want to face those issues down ASAP. Of course, that's good, because as you say it is a small problem in an absolute sense. The second most significant open item is rebasing on top of the recent RLS changes, IMV. I can see yourself and Heikki continuing to change small stylistic things, which I expect to continue for a little while. That's fine, but naturally I want to be aggressive about closing off these open issues that are not just general clean-up, but have some small level of risk of becoming more significant blockers. > The reason I think this has to use the appropriate lock level is that > it'll otherwise re-introduce the deadlocks that fk locks resolved. Given > how much pain we endured to get fk locks, that seems like a bad deal. Right. -- Peter Geoghegan
On 2015-04-23 12:45:59 -0700, Peter Geoghegan wrote: > On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund <andres@anarazel.de> wrote: > > I think you misread my statement: I'm saying we don't need the new > > argument anymore, even if we still do the super-deletion in > > heap_delete(). Now that the speculative insertion will not be visible > > (as in seen on a tuple they could delete) to other backends we can just > > do the super deletion if we see that the tuple is a promise one. > > I disagree. I think it's appropriate that the one and only "super" > heap_delete() caller make a point of indicating that that's what it's > doing. The cost is only that the two other such callers must say that > they're not doing it. Super deletion is a special thing, that logical > decoding knows all about for example, and I think it's appropriate > that callers ask explicitly. Unconvinced. Not breaking an API has its worth. > The second most significant open item is rebasing on top of the recent > RLS changes, IMV. Not sure I agree. That part seems pretty mechanical to me. * The docs aren't suitable for endusers. I think this will take a fair amount of work. * We're not yet sure about the syntax yet. In addition to the keyword issue I'm unconvinced about having two WHERE clausesin the same statement. I think that'll end up confusing users a fair bit. Might still be the best way forward. * The executor integration still isn't pretty, although Heikki is making strides there * I don't think anybody seriously has looked at the planner side yet. Greetings, Andres Freund
On Thu, Apr 23, 2015 at 12:53 PM, Andres Freund <andres@anarazel.de> wrote: > Unconvinced. Not breaking an API has its worth. Yeah, and I acknowledge that - but it's not something that it's appropriate to encapsulate IMV. Let's just leave it to Heikki...I'd say he has the deciding vote, especially as the reviewer that is more in charge of the executor stuff than anything else. >> The second most significant open item is rebasing on top of the recent >> RLS changes, IMV. > > Not sure I agree. That part seems pretty mechanical to me. Hopefully so. > * The docs aren't suitable for endusers. I think this will take a fair > amount of work. It's hard to explain why something like the collation field in the inference specification matters to users...because it's only supported at all due to forwards compatibility concerns. It's hard to document certain things like that in an accessible way. In general, much of the effort of the last year was making the feature play nice, and considering a bunch of usability edge cases that are unlikely to come up, but still must be documented. > * We're not yet sure about the syntax yet. In addition to the keyword > issue I'm unconvinced about having two WHERE clauses in the same > statement. I think that'll end up confusing users a fair bit. Might > still be the best way forward. My previous approach to allowing an index predicate did at least clearly show that the predicate belonged to the inference specification, since it appeared between parenthesis. Maybe we should do that after all? I don't think it much matters if the inference specification is not identical to CREATE INDEX. I don't want to give up inference of partial indexes, and I don't want to give up allowing the UPDATE to have a limited WHERE clause, and we can't just spell those differently here. So what alternative does that leave? Anyone else have an opinion? > * The executor integration still isn't pretty, although Heikki is making > strides there That's just clean-up, though. I'm not worried about the risk of Heikki not succeeding at that. > * I don't think anybody seriously has looked at the planner side yet. Good point. That certainly needs to be looked at (and I include the rewriter parts in that). It's really not that much code, but (ideally) a subject matter expert should look into the whole ExcludedExpr dance in particular. -- Peter Geoghegan
On 04/23/2015 10:53 PM, Andres Freund wrote: > On 2015-04-23 12:45:59 -0700, Peter Geoghegan wrote: >> On Thu, Apr 23, 2015 at 12:55 AM, Andres Freund <andres@anarazel.de> wrote: >>> I think you misread my statement: I'm saying we don't need the new >>> argument anymore, even if we still do the super-deletion in >>> heap_delete(). Now that the speculative insertion will not be visible >>> (as in seen on a tuple they could delete) to other backends we can just >>> do the super deletion if we see that the tuple is a promise one. >> >> I disagree. I think it's appropriate that the one and only "super" >> heap_delete() caller make a point of indicating that that's what it's >> doing. The cost is only that the two other such callers must say that >> they're not doing it. Super deletion is a special thing, that logical >> decoding knows all about for example, and I think it's appropriate >> that callers ask explicitly. > > Unconvinced. Not breaking an API has its worth. The heapam API is not that stable, we've added arguments to those functions every once in a while, and I don't recall any complaints. So I'm with Peter that super-deletion should be requested explicitly by the caller. That said, I'd actually like to see a separate heap_super_delete() function for that, rather than piggybacking on heap_delete(). It's a quite different operation. There'll be some duplication, but seems better than a maze of if-else's in heap_delete. - Heikki
On 2015-04-23 23:08:34 +0300, Heikki Linnakangas wrote: > The heapam API is not that stable, we've added arguments to those functions > every once in a while, and I don't recall any complaints. I heard some, but not too many, that's true. I know that I've written code that'd be broken/needed even more ifdefs ;) > That said, I'd actually like to see a separate heap_super_delete() function > for that, rather than piggybacking on heap_delete(). It's a quite different > operation. There'll be some duplication, but seems better than a maze of > if-else's in heap_delete. +many. I've requested that changes a couple times now. Greetings, Andres Freund
On Thu, Apr 23, 2015 at 1:08 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > That said, I'd actually like to see a separate heap_super_delete() function > for that, rather than piggybacking on heap_delete(). It's a quite different > operation. There'll be some duplication, but seems better than a maze of > if-else's in heap_delete. I found a bug that seems to be down to commit e3144183562d08e347f832f0b29daefe8bac617b on Github: """ commit e3144183562d08e347f832f0b29daefe8bac617b Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> Date: Thu Apr 23 18:38:11 2015 +0300 Minor cleanup of check_exclusion_or_unique_constraint. To improve readability. """ At least, that's what a "git bisect" session showed. Basically, at high client counts (128 clients only), after a few iterations of the B-Tree test, the latest version of jjanes_upsert would see this happen (error originates fromExecOnConflictUpdate(), with custom instrumentation added to identify invisible tuple): 2015-04-23 15:10:48 PDT [ txid: 0 ]: LOG: database system was shut down at 2015-04-23 15:10:09 PDT 2015-04-23 15:10:48 PDT [ txid: 0 ]: LOG: database system is ready to accept connections 2015-04-23 15:12:55 PDT [ txid: 472418 ]: ERROR: attempted to lock invisible tuple (140,39) 2015-04-23 15:12:55 PDT [ txid: 472418 ]: STATEMENT: insert into upsert_race_test (index, filler, count) values ('3896', random_characters(), '1') on conflict (index) update set count=TARGET.count + EXCLUDED.count, filler = EXCLUDED.filler where TARGET.index = EXCLUDED.index returning count This seemed to only show up when fsync = on on my laptop, whereas in the past some race conditions that I've found were easier to recreate with fsync = off. I attach some notes from my bisect/debugging session, including pg_xlogdump output (from a psql session - I like to manipulate pg_xlogdump output using SQL). That's probably not all that interesting, but I attach it all the same. Hopefully this is something that Heikki can easily debug, since the commit implicated here only concerned readability. A simple oversight? If you want, Heikki, I can try and debug it, but it seems like something we're better off having you sign off on if possible. Thanks -- Peter Geoghegan
Attachment
On Thu, Apr 23, 2015 at 12:45 PM, Peter Geoghegan <pg@heroku.com> wrote: >>> * We need to figure out the tuple lock strength details. I think this >>> is doable, but it is the greatest challenge to committing ON CONFLICT >>> UPDATE at this point. Andres feels that we should require no greater >>> lock strength than an equivalent UPDATE. I suggest we get to this >>> after committing the IGNORE variant. We probably need to discuss this >>> some more. >> >> I want to see a clear way forward before we commit parts. It doesn't >> have to be completely iron-clad, but the way forward should be pretty >> clear. What's the problem you're seeing right now making this work? A >> couple days on jabber you seemed to see a way doing this? > > I was really just identifying it as the biggest problem the patch now > faces, and I want to face those issues down ASAP. Of course, that's > good, because as you say it is a small problem in an absolute sense. > The second most significant open item is rebasing on top of the recent > RLS changes, IMV. OK, I pushed out code to do this a while ago. I must admit that I probably significantly over-estimated the difficulty of doing this. With Heikki's problematic commit reverted this works fine (I have not actually reverted it myself...I'll leave it to Heikki to clean that up when he gets around to it). The usually jjanes_upsert stress tests show up no problems. Curious about what you think about my proposal to go back to putting the inference specification WHERE clause within the parenthesis, since this solves several problems, including clarifying to users that the predicate is part of the inference clause. -- Peter Geoghegan
On 04/24/2015 02:52 AM, Peter Geoghegan wrote: > I found a bug that seems to be down to commit > e3144183562d08e347f832f0b29daefe8bac617b on Github: > > """ > commit e3144183562d08e347f832f0b29daefe8bac617b > Author: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Thu Apr 23 18:38:11 2015 +0300 > > Minor cleanup of check_exclusion_or_unique_constraint. > > To improve readability. > > """ > > At least, that's what a "git bisect" session showed. Ok, I see now that I totally screwed up the conditions on "waitMode" in that commit. I just pushed a fix to your github repository. - Heikki
On Fri, Apr 24, 2015 at 12:31 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Ok, I see now that I totally screwed up the conditions on "waitMode" in that > commit. I just pushed a fix to your github repository. I can confirm that the commit you just pushed prevents the implementation from attempting to lock an invisible tuple, where previously it would reliably fall given the same testcase. Thanks -- Peter Geoghegan
Peter, * Peter Geoghegan (pg@heroku.com) wrote: > I came up with something that is along the lines of what we discussed. > I'll wait for you to push Dean's code, and rebase on top of that > before submitting what I came up with. Maybe this can be rolled into > my next revision if I work through Andres' most recent feedback > without much delay. This is done (finally!). Please take a look and let me know if you have any questions or concerns. Happy to chat again also, of course. Thanks! Stephen
On Fri, Apr 24, 2015 at 5:50 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Peter Geoghegan (pg@heroku.com) wrote: >> I came up with something that is along the lines of what we discussed. >> I'll wait for you to push Dean's code, and rebase on top of that >> before submitting what I came up with. Maybe this can be rolled into >> my next revision if I work through Andres' most recent feedback >> without much delay. > > This is done (finally!). Please take a look and let me know if you have > any questions or concerns. Happy to chat again also, of course. Great, thanks. I didn't actually wait for you (as earlier indicated) before posting the new approach to RLS in V3.4. However, I have some decent tests for the new behaviors (I did test-driven development), so I think that when I rebase on top of the new RLS stuff tomorrow, I'll find that it won't be too difficult. -- Peter Geoghegan
* Peter Geoghegan (pg@heroku.com) wrote: > On Fri, Apr 24, 2015 at 5:50 PM, Stephen Frost <sfrost@snowman.net> wrote: > > * Peter Geoghegan (pg@heroku.com) wrote: > >> I came up with something that is along the lines of what we discussed. > >> I'll wait for you to push Dean's code, and rebase on top of that > >> before submitting what I came up with. Maybe this can be rolled into > >> my next revision if I work through Andres' most recent feedback > >> without much delay. > > > > This is done (finally!). Please take a look and let me know if you have > > any questions or concerns. Happy to chat again also, of course. > > Great, thanks. > > I didn't actually wait for you (as earlier indicated) before posting > the new approach to RLS in V3.4. However, I have some decent tests for > the new behaviors (I did test-driven development), so I think that > when I rebase on top of the new RLS stuff tomorrow, I'll find that it > won't be too difficult. Fantastic, glad to hear it! Thanks! Stephen
On Thu, Apr 23, 2015 at 6:07 PM, Peter Geoghegan <pg@heroku.com> wrote: > Curious about what you think about my proposal to go back to putting > the inference specification WHERE clause within the parenthesis, since > this solves several problems, including clarifying to users that the > predicate is part of the inference clause. I've *provisionally* pushed code that goes back to the old way, Andres: https://github.com/petergeoghegan/postgres/commit/2a5d80b27d2c5832ad26dde4651c64dd2004f401 Perhaps this is the least worst way, after all. -- Peter Geoghegan