Re: Making joins involving ctid work for the benefit of UPSERT - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Making joins involving ctid work for the benefit of UPSERT
Date
Msg-id CAM3SWZSa62EjAFwk0R7GLf1k=p2JSwkSatdazp5+Aqbdy1cJ1g@mail.gmail.com
Whole thread Raw
In response to Re: Making joins involving ctid work for the benefit of UPSERT  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> I'd suggest something like:
>
> UPSERT table SET col = value [, ...] WHERE predicate;

I think I see another problem with this. The UPDATE must provide a
WHERE clause Var on a unique indexed column (let's say it's
constrained to provide a "(Var [unique-indexed opclass support
function 3 op] Const)" qual during parse analysis because you asked
for UPSERT. But it can also affect 0 rows for other reasons, since
this UPDATE qual can have arbitrary other expressions. So in general
you have any number of reasons for not updating, which implies that
you must insert, which might not be possible because there might even
still be duplicate key violation in a not-yet-visible-to-you row (even
just in the unique index you intended to merge on). Whereas, when
inserting, there is exactly one reason (per row) to go and update - a
conflict (i.e. a would-be duplicate violation). And this is a conflict
that is meaningful to every transaction, regardless of their snapshot,
since it represents an objective fact about the physical presence of
an index tuple.

So, do you make the UPDATE differentiate between different reasons for
the UPDATE failing, only inserting when an UPSERT would have happened
had you omitted the extra stuff in your UPSERT predicate? Can you
really differentiate anyway? And isn't the choice to do the insert
based on what your MVCC snapshot happens to see - rather than
something that there is necessarily objective truth on, the physical
presence of a duplicate tuple in an index - rather arbitrary? It's a
different situation to my implementation, where you do an insert-like
thing, and then find a conflict row to lock and then update, because
at that point the row is successfully locked, and the WHERE clause may
be evaluated against rejects and the *most recent version* of the
locked, conflict-causing row. The most recent, locked version is not
arbitrary at all. That's the version you ought to evaluate a predicate
against before finally deciding to UPDATE.

I assume you agree with my view that UPSERT should always insert or
update. This kind of business sounds closer to SQL MERGE, where an
insert may not drive things, and you accept these kinds of anomalies
in exchange for a lot more flexibility in not having inserts drive
things. Did you happen to read my post on that?
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [COMMITTERS] pgsql: Diagnose incompatible OpenLDAP versions during build and test.
Next
From: Charlie Holleran
Date:
Subject: date-time library