Re: Update with subselect sometimes returns wrong result - Mailing list pgsql-bugs

From Tom Lane
Subject Re: Update with subselect sometimes returns wrong result
Date
Msg-id 846.1385919570@sss.pgh.pa.us
Whole thread Raw
In response to Re: Update with subselect sometimes returns wrong result  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-bugs
Andres Freund <andres@2ndquadrant.com> writes:
> The reason it reproducably fails is:
> /*
>  * The target tuple was already updated or deleted by the
>  * current command, or by a later command in the current
>  * transaction.  We *must* ignore the tuple in the former
>  * case, so as to avoid the "Halloween problem" of repeated
>  * update attempts.  In the latter case it might be sensible
>  * to fetch the updated tuple instead, but doing so would
>  * require changing heap_lock_tuple as well as heap_update and
>  * heap_delete to not complain about updating "invisible"
>  * tuples, which seems pretty scary.  So for now, treat the
>  * tuple as deleted and do not process.
>  */
> goto lnext;

> in ExecLockRows(), right? Is there actually a real "Halloween problem"
> type of situation here?

That's the $64 question at this point.  In this example, at least,
it seems like we'd want heap_lock_tuple to say that you *can* lock
a tuple that's already updated by the current command.  But that
seems like a pretty scary solution --- I'm not sure there aren't
other cases where it'd be the wrong thing.

> But either way, even if we would manage to finagle some meaning into
> that case, that query would still not be safe in any way, since there's
> no determinism in which row the subselect will return.

It's indeterminate to start with (and I guess the OP doesn't care).
But once the first execution has chosen some row, what we want is
for a rescan to return the same row as before.  We definitely don't
want heap_lock_tuple to *create* nondetermininism in a scan that
otherwise didn't have any.

Maybe the solution is to retain enough state to be able to tell that
the tuple was locked, then updated, in the current command, and then
return "already locked" in that case.

> On a green field, I'd say we should forbid using FOR UPDATE below an
> UPDATE/DELETE and just allow combining them via a CTE. But that's
> probably hard to do now.

Yeah, I was thinking the same thing.  In any case, sticking the
SELECT FOR UPDATE into a WITH should provide an adequate workaround
for now, at least for cases where the outer UPDATE doesn't ever
try to update rows it's not read from the WITH.  (If it does, then
you have the same nondeterminism about whether the WITH would've
returned those rows if it'd gotten to them first.)

            regards, tom lane

pgsql-bugs by date:

Previous
From: David Johnston
Date:
Subject: Re: Update with subselect sometimes returns wrong result
Next
From: Sandeep Thakkar
Date:
Subject: Re: BUG #8639: installation failure