Re: UPSERT wiki page, and SQL MERGE syntax - Mailing list pgsql-hackers

From Robert Haas
Subject Re: UPSERT wiki page, and SQL MERGE syntax
Date
Msg-id CA+TgmoY7DYLtq=qaB9CajF=_qFy-yAcwO5qipbH30BsDcE1c+g@mail.gmail.com
Whole thread Raw
In response to Re: UPSERT wiki page, and SQL MERGE syntax  (Peter Geoghegan <pg@heroku.com>)
Responses Re: UPSERT wiki page, and SQL MERGE syntax
List pgsql-hackers
On Fri, Oct 10, 2014 at 4:41 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Oct 10, 2014 at 12:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> I think what's realistic here is that the patch isn't going to get
>> committed the way you have it today, because nobody else likes that
>> design.  That may be harsh, but I think it's accurate.
>
> I do think that's harsh, because it's unnecessary: I am looking for
> the best design. If you want to propose alternatives, great, but there
> is a reason why I've done things that way, and that should be
> acknowledged. I too think naming the unique index is ugly as sin, and
> have said as much multiple times. We're almost on the same page here.

Come on.  You've had the syntax that way for a while, multiple people
have objected to it, and it didn't get changed.  When people renewed
their objections, you said that they were not having a "realistic
conversation".  I'm tired of getting critiqued because there's some
fine point of one of the scores of emails you've sent on this topic
that you feel I haven't adequately appreciated.  I would like to avoid
getting into a flame-war here where we spend a lot of time arguing
about who should have said what in exactly what way, but I think it is
fair to say that your emails have come across to me, and to a number
of other people here, as digging in your heels and refusing to budge.
I would go so far as to say that said perception is the primary reason
why this patch is making so little progress, far outstripping whatever
the actual technical issues are.

>> Whatever the actual behavior
>> of your patch is today, it seems to me that the conflict is,
>> fundamentally, over a set of column values, NOT over a particular
>> index.  The index is simply a mechanism for noticing that conflict.
>
> I think that this is the kernel of our disagreement. The index is not
> simply a mechanism for noticing that conflict. The user had better
> have one unique index in mind to provoke taking the alternative path
> in the event of a would-be unique violation. Clearly it doesn't matter
> much in this particular example. But what if there were two partial
> unique indexes, that were actually distinct, but only in terms of the
> constant appearing in their predicate? And what about having that
> changed by a before insert row-level trigger? Are we to defer deciding
> which unique index to use at the last possible minute?

I don't immediately see why this would cause a problem.  IIUC, the
scenario we're talking about is:

create table foo (a int, b int);
create unique index foo1 on foo (a) where b = 1;
create unique index foo2 on foo (a) where b = 2;

If the user issues INSERT .. ON DUPLICATE (a) UPDATE, we'll fire
before-insert triggers and then inspect the tuple to be inserted.  If
b is neither 1 nor 2, then we'll fail with an error saying that we
can't support ON DUPLICATE without a relevant index to enforce
uniqueness.  (This can presumably re-use the same error message that
would have popped out if the user done ON DUPLICATE (b), which is
altogether un-indexed.)  But if b is 1 or 2, then we'll search the
matching index for a conflicting tuple; finding none, we'll insert;
finding one, we'll do the update as per the user's instructions.

> As always with this stuff, the devil is in the details. If we work out
> the details, then I can come up with something that's acceptable to
> everyone. Would you be okay with this never working with partial
> unique indexes? That gives me something to work with.

If it can't be made to work with them in a reasonable fashion, I would
probably be OK with that, but so far I see no reason for such
pessimism.

>>>> Also, how about making the SET clause optional, with the semantics
>>>> that we just update all of the fields for which a value is explicitly
>>>> specified:
>>>>
>>>> INSERT INTO overwrite_with_abandon (key, value)
>>>>     VALUES (42, 'meaning of life')
>>>>     ON DUPLICATE (key) UPDATE;
>
>> Your syntax allows the exact same thing; it simply require the user to
>> be more verbose in order to get that behavior.  If we think that
>> people wanting that behavior will be rare, then it's fine to require
>> them to spell it out when they want it.  If we think it will be the
>> overwhelming common application of this feature, and I do, then making
>> people spell it out when we could just as well infer it is pointless.
>
> Did you consider my example? I think that people will like this idea,
> too - that clearly isn't the only consideration, though. As you say,
> it would be very easy to implement this. However, IMV, we shouldn't,
> because it is hazardous. MySQL doesn't allow this, and they tend to
> find expedients like this useful.

I'm considering your points in this area as well as I can, but as far
as I can tell you haven't actually set what's bad about it, just that
it might be hazardous in some way that you don't appear to have
specified, and that MySQL doesn't allow it.  I am untroubled by the
latter point; it is not our goal to confine our implementation to a
subset of MySQL.

But it would be fine to leave this kind of shortcut out of the initial
patch.  If other people agree that it's useful, it will get
re-proposed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: JsonbValue to Jsonb conversion
Next
From: Sean Chittenden
Date:
Subject: Re: [PATCH] PostgreSQL 9.4 mmap(2) performance regression on FreeBSD...