Thread: [HACKERS] INSERT .. ON CONFLICT DO SELECT [FOR ..]
Hi,
Attached is a patch for $SUBJECT. It might still be a bit rough around the edges and probably light on docs and testing, but I thought I'd post it anyway so I won't forget.Attachment
On Mon, Aug 14, 2017 at 6:23 PM, Marko Tiikkaja <marko@joh.to> wrote: > Attached is a patch for $SUBJECT. It might still be a bit rough around the > edges and probably light on docs and testing, but I thought I'd post it > anyway so I won't forget. Is it possible for ON CONFLICT DO SELECT to raise a cardinality violation error? Why or why not? -- Peter Geoghegan
On Tue, Aug 15, 2017 at 7:43 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Aug 14, 2017 at 6:23 PM, Marko Tiikkaja <marko@joh.to> wrote:
> Attached is a patch for $SUBJECT. It might still be a bit rough around the
> edges and probably light on docs and testing, but I thought I'd post it
> anyway so I won't forget.
Is it possible for ON CONFLICT DO SELECT to raise a cardinality
violation error? Why or why not?
No. I don't see when that would need to happen. But I'm guessing you have a case in mind?
.m
On Tue, Aug 15, 2017 at 12:17 AM, Marko Tiikkaja <marko@joh.to> wrote: > On Tue, Aug 15, 2017 at 7:43 AM, Peter Geoghegan <pg@bowt.ie> wrote: >> >> On Mon, Aug 14, 2017 at 6:23 PM, Marko Tiikkaja <marko@joh.to> wrote: >> > Attached is a patch for $SUBJECT. It might still be a bit rough around >> > the >> > edges and probably light on docs and testing, but I thought I'd post it >> > anyway so I won't forget. >> >> Is it possible for ON CONFLICT DO SELECT to raise a cardinality >> violation error? Why or why not? > > > No. I don't see when that would need to happen. But I'm guessing you have > a case in mind? Actually, no, I didn't. But I wondered if you did. I think that it makes some sense not to, now that I think about it. ON CONFLICT DO NOTHING doesn't have cardinality violations, because it cannot affect a row twice if there are duplicates proposed for insertion (at least, not through any ON CONFLICT related codepath). But, this opinion only applies to ON CONFLICT DO SELECT, not ON CONFLICT DO SELECT FOR UPDATE. And I have other reservations, which I'll go in to momentarily, about ON CONFLICT DO SELECT in general. So, the upshot is that I think we need cardinality violations in all cases for this feature. Why would a user ever not want to know that the row was locked twice? On to the subject of my more general reservation: Why did you include ON CONFLICT DO SELECT at all? Why not just ON CONFLICT DO SELECT FOR UPDATE (and FOR SHARE, ...) ? I think I know what you're going to say about it: ON CONFLICT DO NOTHING doesn't lock the conflicting row, so why should I insist on it here (why not have an ON CONFLICT DO SELECT variant, too)? That's not a bad argument. However, I think that there is still a difference. Namely, ON CONFLICT DO NOTHING doesn't let you project the rows with RETURNING (that may not even be visible to the command's MVCC snapshot -- the rows that we also don't lock), because those are simply not the semantics it has. DO NOTHING is more or less about ETL use-cases, some of which involve data from very unclean sources. It seems questionable to cite that as precedent for not locking a row (that is, for having a plain ON CONFLICT DO SELECT variant at all). In other words, while ON CONFLICT DO NOTHING may have set a precedent here, it at least has semantics that limit the consequences of not locking the row; and it *feels* a little bit dirty to use it indifferently, even where that makes sense. ON CONFLICT DO SELECT is probably going to be used within wCTEs some of the time. I'm not sure that a plain ON CONFLICT DO SELECT variant won't allow unpredictable, complicated problems when composed within a more complicated query. (This brings me back!) In other other words, plain SELECT FOR UPDATE has to do all the EPQ stuff, in order to confront many of the same issues that ON CONFLICT must confront in its own way [1], but a plain SELECT never does any EPQ stuff at all. It seems to follow that a plain ON CONFLICT DO SELECT is an oxymoron. If I've missed the point of ON CONFLICT DO SELECT, please let me know how. [1] https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29 -- Peter Geoghegan
On Mon, Sep 4, 2017 at 4:09 AM, Peter Geoghegan <pg@bowt.ie> wrote:
On Tue, Aug 15, 2017 at 12:17 AM, Marko Tiikkaja <marko@joh.to> wrote:
> On Tue, Aug 15, 2017 at 7:43 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>>
>> On Mon, Aug 14, 2017 at 6:23 PM, Marko Tiikkaja <marko@joh.to> wrote:
>> > Attached is a patch for $SUBJECT. It might still be a bit rough around
>> > the
>> > edges and probably light on docs and testing, but I thought I'd post it
>> > anyway so I won't forget.
>>
>> Is it possible for ON CONFLICT DO SELECT to raise a cardinality
>> violation error? Why or why not?
>
>
> No. I don't see when that would need to happen. But I'm guessing you have
> a case in mind?
Actually, no, I didn't. But I wondered if you did. I think that it
makes some sense not to, now that I think about it. ON CONFLICT DO
NOTHING doesn't have cardinality violations, because it cannot affect
a row twice if there are duplicates proposed for insertion (at least,
not through any ON CONFLICT related codepath). But, this opinion only
applies to ON CONFLICT DO SELECT, not ON CONFLICT DO SELECT FOR
UPDATE. And I have other reservations, which I'll go in to
momentarily, about ON CONFLICT DO SELECT in general. So, the upshot is
that I think we need cardinality violations in all cases for this
feature. Why would a user ever not want to know that the row was
locked twice?
I had to look at the patch to see what I'd done, and the tests suggest that we already complain about this with if a FOR [lock level] clause is present:
+begin transaction isolation level read committed;
+insert into selfconflict values (10,1), (10,2) on conflict(f1) do select for update returning *;
+ERROR: ON CONFLICT command cannot affect row a second time
+HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
+commit;
+begin transaction isolation level read committed;
+insert into selfconflict values (10,1), (10,2) on conflict(f1) do select for update returning *;
+ERROR: ON CONFLICT command cannot affect row a second time
+HINT: Ensure that no rows proposed for insertion within the same command have duplicate constrained values.
+commit;
(in expected/insert_conflict.out.)
On to the subject of my more general reservation: Why did you include
ON CONFLICT DO SELECT at all? Why not just ON CONFLICT DO SELECT FOR
UPDATE (and FOR SHARE, ...) ?
If you don't intend to actually do anything with the row in the same database transaction, locking seems unnecessary. For example, you might want to provide an idempotent method in your API which inserts the data and returns the ID to the caller. The transaction is committed by the time the client sees the data, so locking is just extra overhead.
I think I know what you're going to say about it: ON CONFLICT DO
NOTHING doesn't lock the conflicting row, so why should I insist on it
here (why not have an ON CONFLICT DO SELECT variant, too)?
I wasn't going to say that, no.
In other words, while ON CONFLICT DO NOTHING may have set a precedent
here, it at least has semantics that limit the consequences of not
locking the row; and it *feels* a little bit dirty to use it
indifferently, even where that makes sense. ON CONFLICT DO SELECT is
probably going to be used within wCTEs some of the time. I'm not sure
that a plain ON CONFLICT DO SELECT variant won't allow unpredictable,
complicated problems when composed within a more complicated query.
Yeah, in most cases you'd probably do a SELECT FOR KEY SHARE. And I wouldn't mind that being the default, but it would mean inventing special syntax with no previous precedent (as far as I know) for not locking the row in cases where the user doesn't want that. And that doesn't seem too attractive, either.
Maybe it would be better to make the default sensible for people who are not super familiar with postgres. I don't know. And the more I think about the use case above, the less I care about it. But I'm generally against interfaces which put arbitrary restrictions on what power users can do on the basis that less experienced people might misuse the interface.
.m
On Mon, Sep 4, 2017 at 10:05 AM, Marko Tiikkaja <marko@joh.to> wrote: > I had to look at the patch to see what I'd done, and the tests suggest that > we already complain about this with if a FOR [lock level] clause is present: > > +begin transaction isolation level read committed; > +insert into selfconflict values (10,1), (10,2) on conflict(f1) do select > for update returning *; > +ERROR: ON CONFLICT command cannot affect row a second time > +HINT: Ensure that no rows proposed for insertion within the same > command have duplicate constrained values. > +commit; > > (in expected/insert_conflict.out.) Right. I saw that you do it for ON CONFLICT DO SELECT FOR UPDATE, and so on. >> On to the subject of my more general reservation: Why did you include >> ON CONFLICT DO SELECT at all? Why not just ON CONFLICT DO SELECT FOR >> UPDATE (and FOR SHARE, ...) ? > > > If you don't intend to actually do anything with the row in the same > database transaction, locking seems unnecessary. For example, you might > want to provide an idempotent method in your API which inserts the data and > returns the ID to the caller. The transaction is committed by the time the > client sees the data, so locking is just extra overhead. That makes sense, but I personally feel that the plain ON CONFLICT DO SELECT variant isn't worth the trouble. I welcome other people's opinions on that. >> I think I know what you're going to say about it: ON CONFLICT DO >> NOTHING doesn't lock the conflicting row, so why should I insist on it >> here (why not have an ON CONFLICT DO SELECT variant, too)? > > > I wasn't going to say that, no. Well, it was a foundation for the ON CONFLICT DO SELECT variant that I actually agree with, in any case. >> In other words, while ON CONFLICT DO NOTHING may have set a precedent >> here, it at least has semantics that limit the consequences of not >> locking the row; and it *feels* a little bit dirty to use it >> indifferently, even where that makes sense. ON CONFLICT DO SELECT is >> probably going to be used within wCTEs some of the time. I'm not sure >> that a plain ON CONFLICT DO SELECT variant won't allow unpredictable, >> complicated problems when composed within a more complicated query. > > > Yeah, in most cases you'd probably do a SELECT FOR KEY SHARE. And I > wouldn't mind that being the default, but it would mean inventing special > syntax with no previous precedent (as far as I know) for not locking the row > in cases where the user doesn't want that. And that doesn't seem too > attractive, either. I'm not in favor of spellings that are inconsistent with SELECT FOR ... . > Maybe it would be better to make the default sensible for people who are not > super familiar with postgres. I don't know. And the more I think about the > use case above, the less I care about it. I was going to ask you to offer a real-world example of where the plain ON CONFLICT DO SELECT variant is compelling. If you care about it (if you're not going to concede the point), then I still suggest doing so. > But I'm generally against > interfaces which put arbitrary restrictions on what power users can do on > the basis that less experienced people might misuse the interface. I agree with that as a broad principle, but disagree that it applies here. Or if it does, then you have yet to convince me of it. In all sincerity, if you think that I just don't understand your perspective, then please try to make me understand it. Would a power user actually miss ON CONFLICT DO SELECT? And if that's the case, would it really be something they'd remember 5 minutes later? I actually think that ON CONFLICT DO NOTHING does have semantics that are, shall we say, questionable. That's the cost of having it not lock conflicting rows during big ETL operations. That's a huge practical benefit for ETL use-cases. Whereas here, with ON CONFLICT DO SELECT, I see a somewhat greater risk, and a much, much smaller benefit. A benefit that might actually be indistinguishable from zero. -- Peter Geoghegan
On Mon, Sep 4, 2017 at 7:46 PM, Peter Geoghegan <pg@bowt.ie> wrote:
On Mon, Sep 4, 2017 at 10:05 AM, Marko Tiikkaja <marko@joh.to> wrote:
> But I'm generally against
> interfaces which put arbitrary restrictions on what power users can do on
> the basis that less experienced people might misuse the interface.
I agree with that as a broad principle, but disagree that it applies
here. Or if it does, then you have yet to convince me of it. In all
sincerity, if you think that I just don't understand your perspective,
then please try to make me understand it. Would a power user actually
miss ON CONFLICT DO SELECT? And if that's the case, would it really be
something they'd remember 5 minutes later?
I don't know, but I don't want to be limiting what people can do just because I can't come up with a use case.
In any case, others, feel free to chime in.
.m
On Tue, Sep 5, 2017 at 3:28 AM, Marko Tiikkaja <marko@joh.to> wrote: > On Mon, Sep 4, 2017 at 7:46 PM, Peter Geoghegan <pg@bowt.ie> wrote: >> >> On Mon, Sep 4, 2017 at 10:05 AM, Marko Tiikkaja <marko@joh.to> wrote: >> >> > But I'm generally against >> > interfaces which put arbitrary restrictions on what power users can do >> > on >> > the basis that less experienced people might misuse the interface. >> >> I agree with that as a broad principle, but disagree that it applies >> here. Or if it does, then you have yet to convince me of it. In all >> sincerity, if you think that I just don't understand your perspective, >> then please try to make me understand it. Would a power user actually >> miss ON CONFLICT DO SELECT? And if that's the case, would it really be >> something they'd remember 5 minutes later? > > > I don't know, but I don't want to be limiting what people can do just > because I can't come up with a use case. > > In any case, others, feel free to chime in. The patch does not currently apply. I am noticing as well that Peter Geoghegan has registered himself as a reviewer a couple of hours back, so moved to next CF with waiting on author as status. -- Michael
On Wed, Nov 29, 2017 at 8:34 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > The patch does not currently apply. I am noticing as well that Peter > Geoghegan has registered himself as a reviewer a couple of hours back, > so moved to next CF with waiting on author as status. Marko unregistered me, so I promptly reregistered. You'll have to ask him why. -- Peter Geoghegan
On Thu, Nov 30, 2017 at 6:39 AM, Peter Geoghegan wrote:
> On Wed, Nov 29, 2017 at 8:34 PM, Michael Paquier
> wrote:
> > The patch does not currently apply. I am noticing as well that Peter
> > Geoghegan has registered himself as a reviewer a couple of hours back,
> > so moved to next CF with waiting on author as status.
>
> Marko unregistered me, so I promptly reregistered. You'll have to ask him
> why.
Oops. I didn't mean to do what I did, and I apologize. I had my months
mixed up and I thought this commit fest had ended without you having looked
at the patch, so I assumed you would register in the next CF if you were
still interested, or someone else could pick it up.
In any case, *I* don't have any interest in pursuing this patch at this
point. I think this is a really good feature, and as far as I know the
patch is technically solid, or at least was when it still applied. Can
someone else take it from here?
.m
Greetings, * Marko Tiikkaja (marko@joh.to) wrote: > On Thu, Nov 30, 2017 at 6:39 AM, Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Nov 29, 2017 at 8:34 PM, Michael Paquier > > <michael.paquier@gmail.com> wrote: > > > The patch does not currently apply. I am noticing as well that Peter > > > Geoghegan has registered himself as a reviewer a couple of hours back, > > > so moved to next CF with waiting on author as status. > > > > Marko unregistered me, so I promptly reregistered. You'll have to ask him > > why. > > Oops. I didn't mean to do what I did, and I apologize. I had my months > mixed up and I thought this commit fest had ended without you having looked > at the patch, so I assumed you would register in the next CF if you were > still interested, or someone else could pick it up. > > In any case, *I* don't have any interest in pursuing this patch at this > point. I think this is a really good feature, and as far as I know the > patch is technically solid, or at least was when it still applied. Can > someone else take it from here? Given that there hasn't been any further progress on this, I'm going to go ahead and mark it as returned with feedback. It'll be in the archives and in the CF app if someone decides to pick it up but there doesn't seem much point leaving it as Waiting for Author in this CF when the author doesn't have further time/interest in it. Thanks! Stephen