Re: Making joins involving ctid work for the benefit of UPSERT - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Making joins involving ctid work for the benefit of UPSERT
Date
Msg-id CAM3SWZR=24A3ARrGakTs5JZrwdyDLuysB5wfQ=aTJmo80We5pQ@mail.gmail.com
Whole thread Raw
In response to Re: Making joins involving ctid work for the benefit of UPSERT  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Making joins involving ctid work for the benefit of UPSERT
Re: Making joins involving ctid work for the benefit of UPSERT
Re: Making joins involving ctid work for the benefit of UPSERT
List pgsql-hackers
On Mon, Jul 28, 2014 at 8:37 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> AFAIUI, this is because your implementation uses lwlocks in a way that
> Andres and I both find unacceptable.

That's not the case. My implementation uses page-level heavyweight
locks. The nbtree AM used to use them for other stuff. Plenty of other
systems use index level locks managed by a heavyweight lock manager.

>>> Here you seem to be suggested that I intended to propose your existing
>>> design rather than something else, which I didn't.  In this design,
>>> you find the conflict (at most one) but scanning for the tuple to be
>>> updated.
>>
>> Yes, but what if you don't see a conflict because it isn't visible to
>> your snapshot, and then you insert, and only then (step 5), presumably
>> with a dirty snapshot, you find a conflict? How does the loop
>> terminate if that brings you back to step 1 with the same MVCC
>> snapshot feeding the update?
>
> Good point.  Maybe the syntax should be something like:
>
> UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [,
> ...] | select_query }
>
> That would address both the concern about being able to pipe multiple
> tuples through it and the point you just raised.  We look for a row
> that matches each given tuple on the key columns; if one is found, we
> update it; if none is found, we insert.

That basically is my design, except that (tangentially) yours risks
bloat in exchange for not having to use a real locking mechanism, and
has a different syntax. The parts of inserting into an index scan that
I stagger include an initial part that is more or less just an index
scan. With this design you'll have to set up things so that all
indexes can be directly accessed in the manner of ExecInsert() (get a
list of them from the relcache, open them in an order that avoids
possible deadlocks, etc). Why not just use ExecInsert()? I don't think
I'm alone in seeing things that way.

On a mostly unrelated note, I'll remind you of the reason that I felt
it was best to lock indexes. It wasn't so much about avoiding bloat as
it was about avoiding deadlocks. When I highlighted the issue,
Heikki's prototype, which did insert optimistically rather than
locking, was then made to go and physically "super delete" the
upsert-insert conflicting heap tuple (inserted optimistically before
its index tuples), before going to lock a row, in order to avoid
unprincipled deadlocking. In contrast, my design just used a callback
that released page level heavyweight locks before going to lock a row.
Heikki's prototype involved making it so that *even someone else's
dirty snapshot* didn't see our dead speculatively-inserted heap tuple.

Anyway, making all that happen is fairly invasive to a bunch of places
that are just as critical as the nbtree code. I'm not saying it can't
be done, or even that it definitely shouldn't be, but taking an
approach that produces bloat, rather than locking values the way other
systems do (and, to a limited extent Postgres already does) is at
least way more invasive than it first appears. Plus,  I ask you to
consider that.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: sendLong in custom communication doesn't work
Next
From: Peter Geoghegan
Date:
Subject: Re: Making joins involving ctid work for the benefit of UPSERT