Re: Making joins involving ctid work for the benefit of UPSERT - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Making joins involving ctid work for the benefit of UPSERT |
Date | |
Msg-id | CAM3SWZR=24A3ARrGakTs5JZrwdyDLuysB5wfQ=aTJmo80We5pQ@mail.gmail.com Whole thread Raw |
In response to | Re: Making joins involving ctid work for the benefit of UPSERT (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Making joins involving ctid work for the benefit of UPSERT
Re: Making joins involving ctid work for the benefit of UPSERT Re: Making joins involving ctid work for the benefit of UPSERT |
List | pgsql-hackers |
On Mon, Jul 28, 2014 at 8:37 AM, Robert Haas <robertmhaas@gmail.com> wrote: > AFAIUI, this is because your implementation uses lwlocks in a way that > Andres and I both find unacceptable. That's not the case. My implementation uses page-level heavyweight locks. The nbtree AM used to use them for other stuff. Plenty of other systems use index level locks managed by a heavyweight lock manager. >>> Here you seem to be suggested that I intended to propose your existing >>> design rather than something else, which I didn't. In this design, >>> you find the conflict (at most one) but scanning for the tuple to be >>> updated. >> >> Yes, but what if you don't see a conflict because it isn't visible to >> your snapshot, and then you insert, and only then (step 5), presumably >> with a dirty snapshot, you find a conflict? How does the loop >> terminate if that brings you back to step 1 with the same MVCC >> snapshot feeding the update? > > Good point. Maybe the syntax should be something like: > > UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [, > ...] | select_query } > > That would address both the concern about being able to pipe multiple > tuples through it and the point you just raised. We look for a row > that matches each given tuple on the key columns; if one is found, we > update it; if none is found, we insert. That basically is my design, except that (tangentially) yours risks bloat in exchange for not having to use a real locking mechanism, and has a different syntax. The parts of inserting into an index scan that I stagger include an initial part that is more or less just an index scan. With this design you'll have to set up things so that all indexes can be directly accessed in the manner of ExecInsert() (get a list of them from the relcache, open them in an order that avoids possible deadlocks, etc). Why not just use ExecInsert()? I don't think I'm alone in seeing things that way. On a mostly unrelated note, I'll remind you of the reason that I felt it was best to lock indexes. It wasn't so much about avoiding bloat as it was about avoiding deadlocks. When I highlighted the issue, Heikki's prototype, which did insert optimistically rather than locking, was then made to go and physically "super delete" the upsert-insert conflicting heap tuple (inserted optimistically before its index tuples), before going to lock a row, in order to avoid unprincipled deadlocking. In contrast, my design just used a callback that released page level heavyweight locks before going to lock a row. Heikki's prototype involved making it so that *even someone else's dirty snapshot* didn't see our dead speculatively-inserted heap tuple. Anyway, making all that happen is fairly invasive to a bunch of places that are just as critical as the nbtree code. I'm not saying it can't be done, or even that it definitely shouldn't be, but taking an approach that produces bloat, rather than locking values the way other systems do (and, to a limited extent Postgres already does) is at least way more invasive than it first appears. Plus, I ask you to consider that. -- Peter Geoghegan
pgsql-hackers by date: