Making joins involving ctid work for the benefit of UPSERT - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Making joins involving ctid work for the benefit of UPSERT
Date
Msg-id CAM3SWZQ8FNnFDzs++vsahRKCoQQYtv+9zTEUANPeH2j4grQwuw@mail.gmail.com
Whole thread Raw
Responses Re: Making joins involving ctid work for the benefit of UPSERT  (Peter Geoghegan <pg@heroku.com>)
Re: Making joins involving ctid work for the benefit of UPSERT  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
I'm working on UPSERT again. I think that in order to make useful
progress, I'll have to find a better way of overcoming the visibility
issues (i.e. the problem of what to do about a
still-in-progress-to-our-snapshot row being locked at READ COMMITTED
isolation level [1][2]).

I've made some tentative additions to my earlier patch to change the syntax:

postgres=# explain analyze
with c as ( insert into ints(key, val) values (1, 'a'), (2, 'b') on conflict select * for update) update ints set val =
c.valfrom c conflicts;                                                     QUERY PLAN
 

----------------------------------------------------------------------------------------------------------------------Update
onints  (cost=0.03..62.34 rows=2 width=104) (actual
 
time=0.095..0.095 rows=0 loops=1)  CTE c    ->  Insert on ints ints_1  (cost=0.00..0.03 rows=2 width=36)
(actual time=0.039..0.049 rows=2 loops=1)          ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2
width=36) (actual time=0.001..0.002 rows=2 loops=1)  ->  Nested Loop  (cost=0.00..62.31 rows=2 width=104) (actual
time=0.063..0.080 rows=2 loops=1)        Join Filter: (ints.ctid = c.ctid)        Rows Removed by Join Filter: 6
-> CTE Scan on c  (cost=0.00..0.04 rows=2 width=100) (actual
 
time=0.044..0.055 rows=2 loops=1)        ->  Materialize  (cost=0.00..28.45 rows=1230 width=10)
(actual time=0.005..0.006 rows=4 loops=2)              ->  Seq Scan on ints  (cost=0.00..22.30 rows=1230
width=10) (actual time=0.009..0.009 rows=4 loops=1)Planning time: 0.132 msExecution time: 0.159 ms
(12 rows)

This works, as far as it goes. Parse analysis adds the ctid join to
the top-level UPDATE based on the fact that a magical UPDATE statement
(magical due to having a CONFLICTS clause) references a CTE in its
FromList (this is mandated by CONFLICTS during parse analysis), a CTE
containing an INSERT ON CONFLICT, and itself projecting a magical set
of rejected rows sufficient to get a locked c.ctid implicitly. (FWIW,
DELETEs may also accept a CONFLICTS clause and be used like this).

The optimizer does not support joins involving ctid, which is why
there is a seqscan plan - even setting "enable_seqscan = off" does not
alter the plan right now.

Apart from the obvious fact that we're doing an unnecessary Seq Scan
on the target table, which is unacceptable on performance grounds
(since we do in fact already know exactly what list of tids to pass to
the top-level ModifyTable node to UPDATE), there is another problem:
Since I cannot use a tid scan, I cannot play tricks with visibility
within tidscans for the benefit of solving the basic problem of doing
the right thing for READ COMMITTED. Clearly I need to add a new "MVCC
violation" in the spirit of EvalPlanQual() to avoid this problem
(since READ COMMITTED serialization failures are unacceptable to
users), but I'd like this new MVCC violation to be as selective as
possible.

Let me take a step back, though. Presently UPDATE WHERE CURRENT OF
does something that isn't a million miles from what I have in mind to
address the problem. That feature exists for "sensitive cursors",
where we lock rows (since FOR UPDATE was specified with the DECLARE
CURSOR statement) as they're fetched from the cursor. The upshot is
that in general, rows can change "out from under us" since they're not
yet locked, including in ways that might result in a new row version
not satisfying the original SELECT FOR UPDATE query predicate -
clearly users don't want to have to deal with that when they
subsequently go to UPDATE WHERE CURRENT OF. Special handling is
required. Purely because UPDATE WHERE CURRENT OF was specified, as
part of this special handling that update must use a tid scan. There
is a little bit of special machinery within the optimizer (within
cost_tidscan()) to force this for the isCurrentOf case.

The tid scan code within the executor also has special UPDATE WHERE
CURRENT OF handling. TidListCreate() specially handles the case by
asking execCurrentOf() to look up the cursor's portal, and to get a
tid from its QueryDesc for the current cursor position. TidNext()
specially fetches the most recent version visible to our estate's
Snapshot when we UPDATE WHERE CURRENT OF, without considering the
original SELECT FOR UPDATE predicate (and whether or not it would in
any sense continue to be satisfied). Each tid is then fetched from the
heap and returned. I think I ought to be able to do something similar
with this new CONFLICTS clause, by similarly marshaling some list of
tids ultimately originating from the ModifyTable node that locked a
potentially-not-yet-visible row (or maybe the optimizer recognizes the
internal special case and creates the tid scan node "visibility
credulous"). I'd then return heap tuples by using a dirty snapshot.
Since this can only happen for the UPDATE's tid scan, and there must
be a tid scan, it should be correct (on its own fairly reasonable
terms, for READ COMMITTED). After all, at this juncture for the
purposes of UPSERT it will already certainly be the case that the
tuple is locked.

I would like to:

* Get feedback on the hazards of taking this direction, and indeed the
advisability of the whole approach in general.

For example, I think we'd always want to use a DirtySnapshot, and
never try to just get the latest version visible to our esate
snapshot, but I might be wrong about that. Since my magical UPDATE
could have additional boolean conditions in its predicate (that is,
additions to the implicit ctid join, in contrast to the existing
magical WHERE CURRENT OF variant of UPDATE where that's always the
only thing in the predicate), the "morality" of *which* tid should get
to the UPDATE should certainly be considered. Of particular concern is
the fact that having Postgres do one or the other could make the
difference between an UPDATE qual being satisfied or not being
satisfied. Maybe this is an argument against being able to tack on
additional boolean conditions in the following fashion (which I'd
prefer to support without also supporting more that one join (the
magical implicit one), even though as I mentioned WHERE CURRENT OF
does not support boolean conditions):

with c as ( insert into ints(key, val) values (1, 'a'), (2, 'b') on conflict select * for update) update ints set val =
c.valfrom c conflicts where c.key != 1;
 

* Get some direction on what it would take to make joins involving
ctid work, since all of this hinges upon that infrastructure becoming
available. This appears to be a Simple Matter of Programming (at least
for someone that happens to already have a good understanding of the
optimizer), and is anticipated by this comment within tidpath.c:
* There is currently no special support for joins involving CTID; in* particular nothing corresponding to
best_inner_indexscan(). Since it's* not very useful to store TIDs of one table in another table, there* doesn't seem to
beenough use-case to justify adding a lot of code* for that.
 

[1] http://www.postgresql.org/message-id/AANLkTineR-rDFWENeddLg=GrkT+epMHk2j9X0YqpiTY8@mail.gmail.com
[2] http://www.pgcon.org/2014/schedule/attachments/327_upsert_weird.pdf
, slide 25
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED
Next
From: Kouhei Kaigai
Date:
Subject: Re: [v9.5] Custom Plan API