Thread: INSERT ... ON CONFLICT DO UPDATE with _any_ constraint
<div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">I finally got around torunning some UPSERT tests on the development build, which is very exciting for me :)</div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">I'mnot sure if I missed the point with this (probably...): I'm unclearon the reason why DO UPDATE requires explicitly specifying the constraint while DO NOTHING does not. </div><div class="gmail_default"style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Ifit's a feature of the locking implementation (or something) thatfor "DO UPDATE" only one index can be used, then so be it.</div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Howeverif it would be possible to allow any conflict to run the UPDATEclause (in the same way as any conflict triggers DO NOTHING in the alternate form) I would personally find that verypleasant. </div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default"style="font-family:verdana,sans-serif;font-size:small">You could even then arbitrate on conflicts inthe UPDATE clause, if you had to, using (say)</div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default" style="font-size:small"><fontface="monospace, monospace">INSERT INTO mytable ... </font></div><div class="gmail_default"style="font-size:small"><font face="monospace, monospace">ON CONFLICT DO UPDATE SET </font></div><divclass="gmail_default" style="font-size:small"><font face="monospace, monospace"> col1=CASE </font></div><divclass="gmail_default" style="font-size:small"><font face="monospace, monospace"> WHEN mytable.uniquefield1=excluded.uniquefield1THEN </font><span style="font-family:monospace,monospace">targettedvalue1 </span></div><divclass="gmail_default" style="font-size:small"><fontface="monospace, monospace"> ELSE </font><span style="font-family:monospace,monospace">mytable.col1 </span></div><divclass="gmail_default" style="font-size:small"><fontface="monospace, monospace"> END,</font></div><div class="gmail_default" style="font-size:small"><fontface="monospace, monospace"> col2=CASE </font></div><div class="gmail_default" style="font-size:small"><fontface="monospace, monospace"> WHEN mytable.uniquefield2=excluded.uniquefield2 THEN </font><spanstyle="font-family:monospace,monospace">targettedvalue2 </span></div><div class="gmail_default" style="font-size:small"><fontface="monospace, monospace"> ELSE </font><span style="font-family:monospace,monospace">mytable.col2 </span></div><divclass="gmail_default" style="font-size:small"><fontface="monospace, monospace"> END;</font></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Notexactly pretty but workable.</div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br/></div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small">Ijust find it slightly upsetting that for (what I would expect is)the majority use case (when the INSERT would only ever trigger one unique constraint) one must still define the uniquefields.</div><div class="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default"style="font-family:verdana,sans-serif;font-size:small">In the event that the INSERT triggers a constraintthat the UPDATE fails to resolve, it will still fail in exactly the same way that running the ON CONFLICT on aspecific constraint would fail, so it's not like you gain any extra value from specifying the constraint, is it?</div><divclass="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default"style="font-family:verdana,sans-serif;font-size:small">As I said, I probably missed the point.</div><divclass="gmail_default" style="font-family:verdana,sans-serif;font-size:small"><br /></div><div class="gmail_default"style="font-family:verdana,sans-serif;font-size:small">Geoff<br /></div></div>
On 19 May 2015 at 16:32, I wrote:
In the event that the INSERT triggers a constraint that the UPDATE fails to resolve, it will still fail in exactly the same way that running the ON CONFLICT on a specific constraint would fail, so it's not like you gain any extra value from specifying the constraint, is it?
I don't know why I wrote this paragraph, it's just the product of me thinking of something else at the same time:
UPDATE obviously doesn't resolve a conflict as such.
Thinking about it more, I suppose if multiple constraints end up triggering for the same INSERT, it would require UPDATEs of multiple rows. Is that the issue?
Geoff
On 19 May 2015 at 11:49, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
Can you comment on whether the docs are sufficiently detailed to explain this answer?
--
On 19 May 2015 at 16:32, I wrote:In the event that the INSERT triggers a constraint that the UPDATE fails to resolve, it will still fail in exactly the same way that running the ON CONFLICT on a specific constraint would fail, so it's not like you gain any extra value from specifying the constraint, is it?I don't know why I wrote this paragraph, it's just the product of me thinking of something else at the same time:UPDATE obviously doesn't resolve a conflict as such.Thinking about it more, I suppose if multiple constraints end up triggering for the same INSERT, it would require UPDATEs of multiple rows. Is that the issue?
I'm sure we'll be asked these questions many times.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I'm sure we'll be asked these questions many times.Can you comment on whether the docs are sufficiently detailed to explain this answer?
Well http://www.postgresql.org/docs/devel/static/sql-insert.html explains that a conflict_target clause is required but doesn't explain why.
It _does_ make clear that multiple UPDATEs to the same row are not allowed, but that in itself doesn't automatically restrict the use of multiple constraint targets; I could easily INSERT a set of values that would trigger that failure with just one constraint target.
http://www.postgresql.org/docs/devel/static/sql-insert.html talks about how MySQL's ON DUPLICATE can only act against the first matching row where multiple constraints match against multiple rows. I suppose if that were the case here (ie the first excluding row would stop other rows firing against the UPDATE) would break the deterministic feature, but it's not clear if that's true or not. I don't see why multiple target rows couldn't be updated based on multiple constraints, that would not in-and-of-itself break determinism.
If I'm missing the obvious, accept my apologies.
Geoff
On Tue, May 19, 2015 at 12:57 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > Well http://www.postgresql.org/docs/devel/static/sql-insert.html explains > that a conflict_target clause is required but doesn't explain why. Yes, for ON CONFLICT DO UPDATE, it is mandatory. > It _does_ make clear that multiple UPDATEs to the same row are not allowed, > but that in itself doesn't automatically restrict the use of multiple > constraint targets; I could easily INSERT a set of values that would trigger > that failure with just one constraint target. True. > http://www.postgresql.org/docs/devel/static/sql-insert.html talks about how > MySQL's ON DUPLICATE can only act against the first matching row where > multiple constraints match against multiple rows. I suppose if that were the > case here (ie the first excluding row would stop other rows firing against > the UPDATE) would break the deterministic feature, but it's not clear if > that's true or not. I don't see why multiple target rows couldn't be updated > based on multiple constraints, that would not in-and-of-itself break > determinism. > > If I'm missing the obvious, accept my apologies. It's trivial to modify Postgres to not require that a specific unique index be inferred, so that you can omit the inference specification for DO UPDATE just as you can for DO NOTHING. That would make it work in a similar way to MySQL; whatever actually conflict was detected would be assumed to be cause to take the alternative update path. The only reason I can see for wanting to do this is where you're running a migration or something, and two unique indexes are equivalent anyway. Like maybe you have a partial index and a non-partial index, and you're just about to drop one of them. But the inference specification will do the right thing here anyway -- multiple unique indexes can be inferred for edge cases like this. I have a hard time imagining why you'd ever not want to be explicit about what to take the alternative path on for the DO UPDATE variant. Unless perhaps you have a different UPDATE targetlist and so on corresponding to that case, which is currently not possible -- but then what if multiple constraints have would-be violations at the same time? It gets awfully messy very quickly. What do you have in mind? -- Peter Geoghegan
It's trivial to modify Postgres to not require that a specific unique
index be inferred, so that you can omit the inference specification
for DO UPDATE just as you can for DO NOTHING. That would make it work
in a similar way to MySQL; whatever actually conflict was detected
would be assumed to be cause to take the alternative update path.
Except that would break the deterministic behaviour, surely? Because if you only updated one row based on which constraint matched first, the row that was updated would depend on the order in which the constraints were evaluated, yes? I was expecting that matching two constraints would end up UPDATEing two separate rows.
I have a hard time imagining why you'd ever not want to be explicit
about what to take the alternative path on for the DO UPDATE variant.
What do you have in mind?
If I'm being honest, my main driver is laziness :) I don't mind specifying the constraint if I can understand why it's required, but otherwise it just seems like I need to do more typing for no reason. Especially when there's only one unique constraint on a table.
Geoff
On Tue, May 19, 2015 at 1:36 PM, Geoff Winkless <pgsqladmin@geoff.dj> wrote: > On 19 May 2015 at 21:12, Peter Geoghegan <pg@heroku.com> wrote: >> >> It's trivial to modify Postgres to not require that a specific unique >> index be inferred, so that you can omit the inference specification >> for DO UPDATE just as you can for DO NOTHING. That would make it work >> in a similar way to MySQL; whatever actually conflict was detected >> would be assumed to be cause to take the alternative update path. > > > Except that would break the deterministic behaviour, surely? Because if you > only updated one row based on which constraint matched first, the row that > was updated would depend on the order in which the constraints were > evaluated, yes? I was expecting that matching two constraints would end up > UPDATEing two separate rows. Well, it would be deterministic to the extent that the indexes would be evaluated in OID order. But yes, the first would-be duplicate violation would make the update path be taken once and only once for the row proposed for insertion -- at that point, you've given up on insertion (unless there is a row locking conflict). Just like MySQL, I believe. How can you find a would-be violation without inserting? How can you insert without also violating the other thing? It's far messier than it first appears. >> I have a hard time imagining why you'd ever not want to be explicit >> about what to take the alternative path on for the DO UPDATE variant. >> >> What do you have in mind? > > > If I'm being honest, my main driver is laziness :) I don't mind specifying > the constraint if I can understand why it's required, but otherwise it just > seems like I need to do more typing for no reason. Especially when there's > only one unique constraint on a table. Well, I don't have zero sympathy for that, but I'm pretty sure that that's what other people wanted. If I'm being honest, I don't actually remember how true that was. -- Peter Geoghegan
On 19 May 2015 at 16:36, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
--
It's trivial to modify Postgres to not require that a specific unique
index be inferred, so that you can omit the inference specification
for DO UPDATE just as you can for DO NOTHING. That would make it work
in a similar way to MySQL; whatever actually conflict was detected
would be assumed to be cause to take the alternative update path.Except that would break the deterministic behaviour, surely? Because if you only updated one row based on which constraint matched first, the row that was updated would depend on the order in which the constraints were evaluated, yes?
It would depend upon the evaluation order, but that would not break determinism unless you allowed a random evaluation order.
Omitting the clause for DO NOTHING yet requiring it for DO UPDATE doesn't make sense.
We should allow DO UPDATE to exclude a constraint and apply a deterministic order to the constraints. 1. PK if it exists. 2. Replica Identity, when not PK, 3. UNIQUE constraints in name order, like triggers, so users can define a default evaluation order, just like they do with triggers.
I was expecting that matching two constraints would end up UPDATEing two separate rows.
It's not clear to me how a single INSERT could cause two or more UPDATEs.
I have a hard time imagining why you'd ever not want to be explicit
about what to take the alternative path on for the DO UPDATE variant.
What do you have in mind?If I'm being honest, my main driver is laziness :) I don't mind specifying the constraint if I can understand why it's required, but otherwise it just seems like I need to do more typing for no reason. Especially when there's only one unique constraint on a table.
1) Ease of use - Unique constraints don't change very often. This saves time for the common case where they stay the same. It also saves time if they do change, because you avoid having to completely recode your app AND make that happen at exactly the same time you apply the change of unique constraint.
2) Compatibility with MySQL
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, May 19, 2015 at 1:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > We should allow DO UPDATE to exclude a constraint and apply a deterministic > order to the constraints. 1. PK if it exists. 2. Replica Identity, when not > PK, 3. UNIQUE constraints in name order, like triggers, so users can define > a default evaluation order, just like they do with triggers. That seems like something way worse than just allowing it for all constraints. >>> I have a hard time imagining why you'd ever not want to be explicit >>> about what to take the alternative path on for the DO UPDATE variant. >>> >>> What do you have in mind? >> >> >> If I'm being honest, my main driver is laziness :) I don't mind specifying >> the constraint if I can understand why it's required, but otherwise it just >> seems like I need to do more typing for no reason. Especially when there's >> only one unique constraint on a table. > > > 1) Ease of use - Unique constraints don't change very often. This saves time > for the common case where they stay the same. It also saves time if they do > change, because you avoid having to completely recode your app AND make that > happen at exactly the same time you apply the change of unique constraint. I don't see how it's possible to change unique constraints in a way that breaks the inference specification without that actually being desirable -- naming the constraint by name is an escape hatch that is generally discouraged. That's the whole point of inference. I put an awful lot of work into making unique index inference as forgiving as possible. For example, it doesn't care what order attributes appear in, or if they appear redundantly, or if an ON CONFLICT unique index predicate is more selective than any available index that is otherwise satisfied (there is a call to predicate_implied_by()). > 2) Compatibility with MySQL But what you describe isn't compatible with MySQL. It's totally novel. -- Peter Geoghegan
On 19 May 2015 at 17:10, Peter Geoghegan <pg@heroku.com> wrote:
Upthread you said
for DO UPDATE just as you can for DO NOTHING. That would make it work
On Tue, May 19, 2015 at 1:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> We should allow DO UPDATE to exclude a constraint and apply a deterministic
> order to the constraints. 1. PK if it exists. 2. Replica Identity, when not
> PK, 3. UNIQUE constraints in name order, like triggers, so users can define
> a default evaluation order, just like they do with triggers.
That seems like something way worse than just allowing it for all constraints.
I'm talking about the evaluation order; it would still match all constraints, otherwise they wouldn't be constraints.
> 2) Compatibility with MySQL
But what you describe isn't compatible with MySQL. It's totally novel.
"It's trivial to modify Postgres to not require that a specific unique
index be inferred, so that you can omit the inference specificationfor DO UPDATE just as you can for DO NOTHING. That would make it work
in a similar way to MySQL"
Similar is good and useful. Full compatibility is even better.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, May 19, 2015 at 2:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 19 May 2015 at 17:10, Peter Geoghegan <pg@heroku.com> wrote: >> >> On Tue, May 19, 2015 at 1:57 PM, Simon Riggs <simon@2ndquadrant.com> >> wrote: >> > We should allow DO UPDATE to exclude a constraint and apply a >> > deterministic >> > order to the constraints. 1. PK if it exists. 2. Replica Identity, when >> > not >> > PK, 3. UNIQUE constraints in name order, like triggers, so users can >> > define >> > a default evaluation order, just like they do with triggers. >> >> That seems like something way worse than just allowing it for all >> constraints. > > > I'm talking about the evaluation order; it would still match all > constraints, otherwise they wouldn't be constraints. But it doesn't match all constraints when a would-be conflict is detected. IOW, we lock the row and go to UPDATE, and then the user is on their own insofar as avoiding duplicate violations goes. What might have happened in other unique indexes (had that original would-be dup violation not occurred) is irrelevant (with the MySQL thing, say) -- you better just get it right, and know that if a dup violation occurs it was the one you anticipated (e.g. because there is only one unique index anyway). With Postgres, we want to make sure that the user has put thought into the condition they take that update path on, and so it is mandatory (it can infer multiple unique indexes, but only when they're basically equivalent for this purpose). I think I agree with you, though: We should change things so that the relcache gives indexes in something like the ordering that you outline, rather than in the current arbitrary (though consistent) OID order. However, I think that this should be done to avoid unnecessary index bloat (fail early), and I don't think it makes much sense to do it on the grounds you outline. This is because you can still easily take the alternative path for the wrong reason, causing subtle "logical corruption". You can still not match all indexes because one index had a would-be dup violation (and so, as I said, it doesn't matter what would have happened with the other ones). Maybe you still get a dup violation from the update, "saving" you, but who wants to rely on that? >> > 2) Compatibility with MySQL >> >> But what you describe isn't compatible with MySQL. It's totally novel. > > > Upthread you said > > "It's trivial to modify Postgres to not require that a specific unique > index be inferred, so that you can omit the inference specification > for DO UPDATE just as you can for DO NOTHING. That would make it work > in a similar way to MySQL" > > Similar is good and useful. Full compatibility is even better. I actually do not feel strongly that it would be terrible to allow the user to omit an inference clause for the DO UPDATE variant (on the grounds of that being closer to MySQL). After all, we don't mandate that the user specifies an explicit targetlist for INSERT, and that seems like a footgun to me. If you want to make the case for doing things that way, I probably will not oppose it. FWIW, I don't think it's unreasonable to have a little discussion on fine points of semantics like that post feature-freeze. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > I think I agree with you, though: We should change things so that the > relcache gives indexes in something like the ordering that you > outline, rather than in the current arbitrary (though consistent) OID > order. I'm fairly sure that there are aspects of the code that rely on indexes being returned by RelationGetIndexList() in a stable order. While I doubt that has to be exactly increasing-OID-order, I'm quite concerned about allowing the order to depend on mutable aspects of the indexes, like names. regards, tom lane
On Tue, May 19, 2015 at 5:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@heroku.com> writes: >> I think I agree with you, though: We should change things so that the >> relcache gives indexes in something like the ordering that you >> outline, rather than in the current arbitrary (though consistent) OID >> order. > > I'm fairly sure that there are aspects of the code that rely on indexes > being returned by RelationGetIndexList() in a stable order. While I doubt > that has to be exactly increasing-OID-order, I'm quite concerned about > allowing the order to depend on mutable aspects of the indexes, like > names. I thought the importance of the ordering was just down to some AMs (like hash) using heavyweight locks. This could cause unprincipled deadlocks in the face of an inconsistent ordering. nbtree used to use page-level heavyweight locks many years ago, too, so this used to be a big, obvious requirement. Maybe there is another reason, but AFAICR there are no hints of that from the relevant code, and I've looked carefully. If it was ever changed, I think it could be done in a way that didn't add any problems, assuming I've accounted for all the ways in which changing the ordering could be problematic. -- Peter Geoghegan
It's not clear to me how a single INSERT could cause two or more UPDATEs.
CREATE TABLE mytable (
c1 int NOT NULL,
c2 int NOT NULL,
PRIMARY KEY (c1),
UNIQUE (c2)
);
INSERT INTO mytable (c1, c2) (10, 20);
INSERT INTO mytable (c1, c2) (11, 21);
INSERT INTO mytable (c1, c2) (10, 21) ON CONFLICT DO UPDATE .....
Or did you mean from a coding point of view how it would be possible to implement?
Geoff
On 19 May 2015 at 19:59, Peter Geoghegan <pg@heroku.com> wrote:
--
On Tue, May 19, 2015 at 2:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 19 May 2015 at 17:10, Peter Geoghegan <pg@heroku.com> wrote:
>>
>> On Tue, May 19, 2015 at 1:57 PM, Simon Riggs <simon@2ndquadrant.com>
>> wrote:
>> > We should allow DO UPDATE to exclude a constraint and apply a
>> > deterministic
>> > order to the constraints. 1. PK if it exists. 2. Replica Identity, when
>> > not
>> > PK, 3. UNIQUE constraints in name order, like triggers, so users can
>> > define
>> > a default evaluation order, just like they do with triggers.
>>
>> That seems like something way worse than just allowing it for all
>> constraints.
>
>
> I'm talking about the evaluation order; it would still match all
> constraints, otherwise they wouldn't be constraints.
But it doesn't match all constraints when a would-be conflict is
detected.
No not all, but we can evaluate the constraints one at a time in a consistent order.
My point is this: We do not need to explicitly specify the constraint we wish to test to ensure that we get deterministic behaviour. So it is possible to avoid specifying a constraint/conflict target and still get deterministic behaviour (which is essential).
With Postgres, we want to make sure that the user has
put thought into the condition they take that update path on, and so
it is mandatory (it can infer multiple unique indexes, but only when
they're basically equivalent for this purpose).
If I have two constraints and I think about it, I would want to be able to specify this...
INSERT
ON CONFLICT (col1) DO UPDATE... (handle it one way)
ON CONFLICT (col2) DO UPDATE... (handle it 2nd way)
but I cannot with the current syntax.
It seems strange to force the user to think about constraint handling and then not offer them any choices once they have done the thinking.
If the update is the same no matter which constraint is violated, why would I need to specify the constraint? We're forcing the developer to make an arbitrary choice between two constraints.
I actually do not feel strongly that it would be terrible to allow the
user to omit an inference clause for the DO UPDATE variant (on the
grounds of that being closer to MySQL). After all, we don't mandate
that the user specifies an explicit targetlist for INSERT, and that
seems like a footgun to me. If you want to make the case for doing
things that way, I probably will not oppose it. FWIW, I don't think
it's unreasonable to have a little discussion on fine points of
semantics like that post feature-freeze.
We will see many people ask why they have to specify constraints explicitly.
As I've pointed out, if the underlying model changes then you now have to explicitly recode all the SQL as well AND time that exactly so you roll out the new code at the same time you add/change constraints. That makes it much harder to use this feature than I would like.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 20 May 2015 at 05:49, Geoff Winkless <pgsqladmin@geoff.dj> wrote:
--
It's not clear to me how a single INSERT could cause two or more UPDATEs.CREATE TABLE mytable (c1 int NOT NULL,c2 int NOT NULL,PRIMARY KEY (c1),UNIQUE (c2));INSERT INTO mytable (c1, c2) (10, 20);INSERT INTO mytable (c1, c2) (11, 21);INSERT INTO mytable (c1, c2) (10, 21) ON CONFLICT DO UPDATE .....Or did you mean from a coding point of view how it would be possible to implement?
I mean "how could that possibly have useful meaning?".
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, May 21, 2015 at 9:51 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > No not all, but we can evaluate the constraints one at a time in a > consistent order. We do so currently. Now, you point out that that might not be the most useful ordering, and as it happens I agree. But changing that ordering to not just be OID-ordering, but to put the PK first (and so on) isn't going to fundamentally change anything. FWIW, I think that that much (PK first) will usually accidentally be true anyway, because of the way that create table statement is originally executed. > My point is this: We do not need to explicitly specify the constraint we > wish to test to ensure that we get deterministic behaviour. So it is > possible to avoid specifying a constraint/conflict target and still get > deterministic behaviour (which is essential). It is deterministic, but omitting an inference specification still risks taking the wrong path. You seem not be acknowledging that you can still take the wrong path due to a dup violation in the wrong constraint. So being guaranteed to have observed or not observed a would-be dup violation in the PK does not buy much. > If I have two constraints and I think about it, I would want to be able to > specify this... > > INSERT > ON CONFLICT (col1) DO UPDATE... (handle it one way) > ON CONFLICT (col2) DO UPDATE... (handle it 2nd way) > > but I cannot with the current syntax. > > It seems strange to force the user to think about constraint handling and > then not offer them any choices once they have done the thinking. What if both constraints are violated? Won't the update end up in trouble? > If the update is the same no matter which constraint is violated, why would > I need to specify the constraint? We're forcing the developer to make an > arbitrary choice between two constraints. Why would the update be the same, though? How could that make sense? You're still going to have to update both unique-indexed columns with something, and that could fail. > We will see many people ask why they have to specify constraints explicitly. I'm not sure that we will, actually, but as I said, go ahead and propose removing the restriction if you think it's important (maybe start a thread on it). > As I've pointed out, if the underlying model changes then you now have to > explicitly recode all the SQL as well AND time that exactly so you roll out > the new code at the same time you add/change constraints. That makes it much > harder to use this feature than I would like. If the underlying model changes, then it's good that your queries break, because they're predicated on the original model. I don't think that happens very often at all. What is much more routine - adding redundant indexes to reindex using CREATE INDEX CONCURRENTLY, or changing the predicate on whatever partial unique indexes happen to be defined on the table - is handled gracefully. -- Peter Geoghegan
On 21 May 2015 at 14:25, Peter Geoghegan <pg@heroku.com> wrote:
--
> If I have two constraints and I think about it, I would want to be able to
> specify this...
>
> INSERT
> ON CONFLICT (col1) DO UPDATE... (handle it one way)
> ON CONFLICT (col2) DO UPDATE... (handle it 2nd way)
>
> but I cannot with the current syntax.
>
> It seems strange to force the user to think about constraint handling and
> then not offer them any choices once they have done the thinking.
What if both constraints are violated? Won't the update end up in trouble?
Great question. We don't handle that at the moment. So how do we handle that?
> If the update is the same no matter which constraint is violated, why would
> I need to specify the constraint? We're forcing the developer to make an
> arbitrary choice between two constraints.
Why would the update be the same, though?
*If* is the keyword there.
How could that make sense?
It wouldn't, that is the point. So why does the current syntax force that?
You're still going to have to update both unique-indexed columns with
something, and that could fail.
ISTM clear that you might want to handle each kind of violation differently, but we cannot.
> We will see many people ask why they have to specify constraints explicitly.
I'm not sure that we will, actually, but as I said, go ahead and
propose removing the restriction if you think it's important (maybe
start a thread on it).
I am. I have. Many times. What is wrong with this thread or all of the other times I said it?
Please look at the $SUBJECT of this thread. We're here now.
> As I've pointed out, if the underlying model changes then you now have to
> explicitly recode all the SQL as well AND time that exactly so you roll out
> the new code at the same time you add/change constraints. That makes it much
> harder to use this feature than I would like.
If the underlying model changes, then it's good that your queries
break, because they're predicated on the original model. I don't think
that happens very often at all.
If it seldom happens, then why do we need to specify the conflict-target? If I know there is only one unique constraint, why can I not rely upon that knowledge?
What is much more routine - adding
redundant indexes to reindex using CREATE INDEX CONCURRENTLY, or
changing the predicate on whatever partial unique indexes happen to be
defined on the table - is handled gracefully.
What has CREATE INDEX CONCURRENTLY got to do with this? If you don't specify the conflict-target at all, it wouldn't matter what the indexes are. If you have two indexes the same then it clearly wouldn't matter which one was checked first.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
* Simon Riggs (simon@2ndQuadrant.com) wrote: > On 21 May 2015 at 14:25, Peter Geoghegan <pg@heroku.com> wrote: > > > If the update is the same no matter which constraint is violated, why > > would > > > I need to specify the constraint? We're forcing the developer to make an > > > arbitrary choice between two constraints. > > > > Why would the update be the same, though? > > *If* is the keyword there. Agreed. > > We will see many people ask why they have to specify constraints > > explicitly. > > > > I'm not sure that we will, actually, but as I said, go ahead and > > propose removing the restriction if you think it's important (maybe > > start a thread on it). > > > > I am. I have. Many times. What is wrong with this thread or all of the > other times I said it? > > Please look at the $SUBJECT of this thread. We're here now. I've also asked for this. Thanks! Stephen
On Thu, May 21, 2015 at 11:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > It seems strange to force the user to think about constraint handling >> > and >> > then not offer them any choices once they have done the thinking. >> >> What if both constraints are violated? Won't the update end up in trouble? > > > Great question. We don't handle that at the moment. So how do we handle > that? By writing two separate INSERT ... ON CONFLICT DO UPDATE statements? There is very little or no disadvantage to doing it that way. >> > If the update is the same no matter which constraint is violated, why >> > would >> > I need to specify the constraint? We're forcing the developer to make an >> > arbitrary choice between two constraints. >> >> Why would the update be the same, though? > > > *If* is the keyword there. I'm having a hard time imagining a scenario in which the update would be the same. That's why I asked how it could be. I'm asking for a practical example involving plausible business rules. >> How could that make sense? > > > It wouldn't, that is the point. So why does the current syntax force that? > >> >> You're still going to have to update both unique-indexed columns with >> something, and that could fail. > > > ISTM clear that you might want to handle each kind of violation differently, > but we cannot. I think you can -- with two statements. >> > We will see many people ask why they have to specify constraints >> > explicitly. >> >> I'm not sure that we will, actually, but as I said, go ahead and >> propose removing the restriction if you think it's important (maybe >> start a thread on it). > > > I am. I have. Many times. What is wrong with this thread or all of the other > times I said it? > > Please look at the $SUBJECT of this thread. We're here now. What do you want me to do about it? I've said that I think that what you say about not mandating the inference clause in the parser could be okay. If you want to change it, obviously you're going to need to get some buy in, and this thread could easily be missed. I'm not willing to defend mandating it, and I'm not willing to argue for removing it (to be clear, I think being able to infer a unique index is very important, but that doesn't mean that I'm attached to mandating it for UPDATE). That's all. >> > As I've pointed out, if the underlying model changes then you now have >> > to >> > explicitly recode all the SQL as well AND time that exactly so you roll >> > out >> > the new code at the same time you add/change constraints. That makes it >> > much >> > harder to use this feature than I would like. >> >> If the underlying model changes, then it's good that your queries >> break, because they're predicated on the original model. I don't think >> >> that happens very often at all. > > > If it seldom happens, then why do we need to specify the conflict-target? If > I know there is only one unique constraint, why can I not rely upon that > knowledge? You say that as if I'm giving you pushback on that point -- for the third time, I'm not. If there is more than one unique constraint (or if there might be in the future), why take the chance that the update will take the wrong path? I'm not saying that that's the overriding consideration, but it is certainly a big consideration. >> What is much more routine - adding >> redundant indexes to reindex using CREATE INDEX CONCURRENTLY, or >> changing the predicate on whatever partial unique indexes happen to be >> defined on the table - is handled gracefully. > > > What has CREATE INDEX CONCURRENTLY got to do with this? If you don't specify > the conflict-target at all, it wouldn't matter what the indexes are. If you > have two indexes the same then it clearly wouldn't matter which one was > checked first. I'm not talking about that here. What I meant is that changes to unique indexes that don't affect the underlying model (as you put it) don't break your queries. Changes that do *will* break your queries. And that's definitely a good thing. I am pretty neutral on whether it's right to mandate that DO UPDATE statements *must* buy in to this. -- Peter Geoghegan
On 21 May 2015 at 15:44, Peter Geoghegan <pg@heroku.com> wrote:
--
> Please look at the $SUBJECT of this thread. We're here now.
What do you want me to do about it? I've said that I think that what
you say about not mandating the inference clause in the parser could
be okay. If you want to change it, obviously you're going to need to
get some buy in, and this thread could easily be missed. I'm not
willing to defend mandating it, and I'm not willing to argue for
removing it (to be clear, I think being able to infer a unique index
is very important, but that doesn't mean that I'm attached to
mandating it for UPDATE). That's all.
OK, let me summarise. First, thanks for putting time into this feature; we all wish to see it work and work well.
The current ON CONFLICT syntax requires us to specify one-and-only-one conflict_target/conflict_action pair. I would like to be able to specify 0, 1 or more conflict_targets, as the developer desires.
It is very desirable to be able to specify DO UPDATE without any conflict_target, relying instead on our ability to infer a conflict_target deterministically. That is the way other systems work and we should be aiming to provide similar ease of use. Having said that, we all recognize that MySQL is broken for multiple constraints and we have done well to come up with a design that allows us to specify finer grained control when we have multiple constraints. (Ideally, we would use the identical syntax to MySQL, but that is secondary to simply avoiding specifying a conflict_target).
If we do have multiple constraints then we should be allowed to specify multiple conflict_target/conflict_action pairs (or similar), since few people believe that one conflict_action would cover the various permutations that occur with multiple potential constraint failures.
In summary, the current design seeks to overcome the problems of having multiple constraints, but doesn't yet do so in a flexible (0) or complete (>1) way.
As the patch author I hope and expect that you will listen to this and consider how you will resolve these problems, just as any of us has done when they are the patch author, even after commit. I would like to see this happen now before we get hit with usage questions similar to OP's. If both requests cannot happen now, if we can at least agree a path for future enhancement we can refer people to what will happen in later releases when they ask.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, May 21, 2015 at 1:15 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > OK, let me summarise. First, thanks for putting time into this feature; we > all wish to see it work and work well. You're welcome. > The current ON CONFLICT syntax requires us to specify one-and-only-one > conflict_target/conflict_action pair. I would like to be able to specify 0, > 1 or more conflict_targets, as the developer desires. Well, multiple unique indexes (that represent essentially the same business rule) can be inferred at the same time, for edge-cases around migrations and so on. > It is very desirable to be able to specify DO UPDATE without any > conflict_target, relying instead on our ability to infer a conflict_target > deterministically. That is the way other systems work and we should be > aiming to provide similar ease of use. Having said that, we all recognize > that MySQL is broken for multiple constraints and we have done well to come > up with a design that allows us to specify finer grained control when we > have multiple constraints. (Ideally, we would use the identical syntax to > MySQL, but that is secondary to simply avoiding specifying a > conflict_target). Okay. No real argument here so far. > If we do have multiple constraints then we should be allowed to specify > multiple conflict_target/conflict_action pairs (or similar), since few > people believe that one conflict_action would cover the various permutations > that occur with multiple potential constraint failures. > > In summary, the current design seeks to overcome the problems of having > multiple constraints, but doesn't yet do so in a flexible (0) or complete > (>1) way. My difficulty with this (which seems distinct to the concern about not mandating an inference specification, a concern which seems to only be about laziness and/or MySQL compatibility) is that I think you'll have a very hard time finding a case where the update naturally applies to the path when either constraint is taken, and applies indifferently. After all, and as I said, why should you not fail when updating the *other* constrained column in the update? Also, why should you not have to worry about *both* constraints failing at once (from the insert)? I think that if we try and address these cases, we'll end up with something unusable, complicated, and no better than simply writing two statements. > As the patch author I hope and expect that you will listen to this and > consider how you will resolve these problems, just as any of us has done > when they are the patch author, even after commit. I would like to see this > happen now before we get hit with usage questions similar to OP's. If both > requests cannot happen now, if we can at least agree a path for future > enhancement we can refer people to what will happen in later releases when > they ask. That's reasonable. I only ask that you describe a plausible use case. Let's start with that. Try and convince me. -- Peter Geoghegan
On Thu, May 21, 2015 at 1:27 PM, Peter Geoghegan <pg@heroku.com> wrote: >> As the patch author I hope and expect that you will listen to this and >> consider how you will resolve these problems, just as any of us has done >> when they are the patch author, even after commit. I would like to see this >> happen now before we get hit with usage questions similar to OP's. If both >> requests cannot happen now, if we can at least agree a path for future >> enhancement we can refer people to what will happen in later releases when >> they ask. > > That's reasonable. I only ask that you describe a plausible use case. > Let's start with that. Try and convince me. Also, by not taking a firm position one way or the other on the first point, the question of whether or not it is appropriate to mandate an inference specification for the DO UPDATE variant, I am not simply being indifferent. Rather, I don't know one way or the other. I reserve judgement. -- Peter Geoghegan
On 21 May 2015 at 16:27, Peter Geoghegan <pg@heroku.com> wrote:
We might choose these rules:
--
Try and convince me.
(There is no "try")
CREATE TABLE customers
(username TEXT PRIMARY KEY
,email TEXT UNIQUE
,billing NUMERIC(11,2)
);
1. INSERT INTO customers VALUES ('sriggs', 'simon@2ndQuadrant.com', 10.0);
2. INSERT INTO customers VALUES ('sriggs', 'simon@2ndQuadrant.com', 10.0);
3. INSERT INTO customers VALUES ('sriggs2', 'simon@2ndQuadrant.com', 10.0);
4. INSERT INTO customers VALUES ('sriggs', 'simon.riggs@2ndQuadrant.com', 10.0);
How should we choose to handle the above data?
If no constraints violate, accept insert
If both constraints violate, treat it as a repeat record and just set the billing to the new value.
If first constraint violates but not second, treat it as an email address update AND increment the billing, if any
If second constraint violates, reject the row since we only allow one userid per person
With 2 constraints we have 4 permutations, i.e. O(2^N) permutations. If we are claiming to handle multiple constraints, I don't think we should just assume that they can all use the same UPDATE. I might point out that the MERGE syntax allowed us to handle that fully, but you'll probably scream.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<p dir="ltr"><br /> On 21 May 2015 21:15, "Simon Riggs" <<a href="mailto:simon@2ndquadrant.com">simon@2ndquadrant.com</a>>wrote:<br /> > I would like to see this happen now beforewe get hit with usage questions similar to OP's. If both requests cannot happen now, if we can at least agree a pathfor future enhancement we can refer people to what will happen in later releases when they ask.<p dir="ltr">FWIW as theOP I was merely asking for clarification as to why it was required for UPDATE and not DO NOTHING. <p dir="ltr">The discussionhas been enough to clarify for me: putting something along the lines of what was discussed in this thread in thedocumentation alongside the requirement would, IMO, be sufficient.<p dir="ltr">I'm not trying to suggest that I representeveryone and that the discussion is over, but as you pointed at the OP I thought I should put in where I am withthis now.<p dir="ltr">Ta<p dir="ltr">Geoff<br />
On Thu, May 21, 2015 at 1:50 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > (There is no "try") > > CREATE TABLE customers > (username TEXT PRIMARY KEY > ,email TEXT UNIQUE > ,billing NUMERIC(11,2) > ); > > 1. INSERT INTO customers VALUES ('sriggs', 'simon@2ndQuadrant.com', 10.0); > 2. INSERT INTO customers VALUES ('sriggs', 'simon@2ndQuadrant.com', 10.0); > 3. INSERT INTO customers VALUES ('sriggs2', 'simon@2ndQuadrant.com', 10.0); > 4. INSERT INTO customers VALUES ('sriggs', 'simon.riggs@2ndQuadrant.com', > 10.0); Presumably you meant to indicate that these were upserts that lacked an explicit inference specification. > How should we choose to handle the above data? I don't know. > We might choose these rules: > If no constraints violate, accept insert > If both constraints violate, treat it as a repeat record and just set the > billing to the new value. > If first constraint violates but not second, treat it as an email address > update AND increment the billing, if any > If second constraint violates, reject the row since we only allow one userid > per person > With 2 constraints we have 4 permutations, i.e. O(2^N) permutations. If we > are claiming to handle multiple constraints, I don't think we should just > assume that they can all use the same UPDATE. I might point out that the > MERGE syntax allowed us to handle that fully, but you'll probably scream. Well, MERGE doesn't, because it doesn't know anything about unique indexes or concurrency. And in practice the join condition is almost always an equi-join, with SQL server for example strongly advising against putting much of anything in the join. Anyway, I think that I might get what you're saying now. ISTM that this could almost be accomplished without having multiple unique constraints inferred in the way I thought you meant. You'd be using all available unique indexes as arbiters, say. Any one could force us to take the alternative path. You wouldn't have any way to be directly introspect which unique index forced the update path to be taken, but maybe that doesn't matter - you can figure it out another way. In this future version of upsert (this future version that I think is workable), you can chain together multiple "DO UPDATE WHERE .... ELSE DO DELETE WHERE ... " style handlers. You can handle each case of yours at that level, by referencing the EXCLUDED.* and target alias in each WHERE clause. This is closer to SQL MERGE (but unlike SQL MERGE, you can't avoid an insert sometimes -- we always need that to terminate the loop to maintain the useful upsert guarantees that MERGE lacks). This gets you most of the way there. Once you heap_lock_tuple() the row (before going on to consider an update), you can be sure that *all* values appearing in the existing target tuple are also locked, just because the row is locked. You can't be sure that the update changing (say) the e-mail field within the update won't then get a duplicate violation, so I think this isn't 100% of what you're looking for, but not too far off. However, I have a hard time believing that really ensuring no constraint violation on either of *both* constraints from the update (or doing anything to avoid dup violations from an update) will ever become a sensible user-visible feature. I'm already playing games with MVCC. It's just too complicated. When you get an dup violation from (say) updating the e-mail address is probably something that needs to be shown to the webapp user or whatever anyway. BTW, I tried to make updates use the speculative insertion infrastructure at one point at Heikki's request, and it was utterly intractable (MVCC snapshots cannot see speculatively inserted tuples, but that goes out the window when updates need to work with it). But that's incidental - my objection above is that doing 100% of what you ask for is conceptually very hard to pin down. -- Peter Geoghegan