Re: Making joins involving ctid work for the benefit of UPSERT - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Making joins involving ctid work for the benefit of UPSERT
Date
Msg-id CA+TgmoYNszLBSjDVf-bn7ff3dGWsgYNyLZY2y=zN=rZEBfL0vA@mail.gmail.com
Whole thread Raw
In response to Re: Making joins involving ctid work for the benefit of UPSERT  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Making joins involving ctid work for the benefit of UPSERT  (Peter Geoghegan <pg@heroku.com>)
Re: Making joins involving ctid work for the benefit of UPSERT  (Greg Stark <stark@mit.edu>)
Re: Making joins involving ctid work for the benefit of UPSERT  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Sat, Jul 19, 2014 at 10:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Fri, Jul 18, 2014 at 11:23 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> Meh. A understandable syntax wouldn't require the pullups with a special
>> scan node and such.
>
> Well, in general ExecModifyTable()/ExecUpdate() trusts the tid passed
> to match the qual in the query. Unless you're willing to give up on
> the idea of having a qual on the update (other than something like an
> ON CONFLICTS qual, obviously) I think you'll probably end up with some
> kind of pseudo-scan node anyway, if only for consistency with how
> things already work, to give you somewhere to show the Filter in
> explain output and so on. ExecScan() probably needs to ExecQual().
> Inserts don't have scan nodes, but updates do, and so it seems pretty
> odd to make the "Insert" node (a ModifyTable node) pullup from a
> result node (as always), and the "Update" node (also a ModifyTable
> node) treat the first ModifyTable node as its own pseudo-scan node,
> when in fact we are scanning the heap in a manner not unlike a
> conventional update. Or do you envision a second result node where an
> update qual might be evaluated? I am not enthused about either
> possibility.
>
> The idea of not having the ability to put a predicate on the update at
> all, much like the MySQL thing isn't completely outrageous, but it
> certainly isn't going to go down well those that have already
> expressed concerns about how much of a foot gun this could be.

This all seems completely to one side of Andres's point.  I think what
he's saying is: don't implement an SQL syntax of the form INSERT ON
CONFLICT and let people use that to implement upsert.  Instead,
directly implement an SQL command called UPSERT or similar.  That's
long been my intuition as well.  I think generality here is not your
friend.

I'd suggest something like:

UPSERT table SET col = value [, ...] WHERE predicate;

with semantics like this:

1. Search the table, using any type of scan you like, for a row
matching the given predicate.
2. If you find more than one tuple that is visible to your scan, error.
3. If you find exactly one tuple that is visible to your scan, update it.  Stop.
4. If you find no tuple that is visible to your scan, but you do find
at least one tuple whose xmin has committed and whose xmax has not
committed, then (4a) if the isolation level is REPEATABLE READ or
higher, error; (4b) if there is more than one such tuple, error; else
(4b) update it, in violation of normal MVCC visibility rules.
5. Construct a new tuple using the col/value pairs from the SET clause
and try to insert it.  If this would fail with a unique index
violation, back out and go back to step 1.

Having said all that, I believe the INSERT ON CONFLICT syntax is more
easily comprehensible than previous proposals.  But I still tend to
agree with Andres that an explicit UPSERT syntax or something like it,
that captures all of the MVCC games inside itself, is likely
preferable from a user standpoint, whatever the implementation ends up
looking like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: gaussian distribution pgbench -- splits v4
Next
From: Emre Hasegeli
Date:
Subject: Re: Index-only scans for multicolumn GIST