Thread: Update with subselect sometimes returns wrong result

Update with subselect sometimes returns wrong result

From

Oliver Seemann

Date:

29 November 2013, 23:08:23

Hi!

Given the following table:

CREATE TABLE t1 (id INTEGER);
INSERT INTO t1 VALUES (0), (1);

Then the following UPDATE should return exactly one row:

UPDATE t1 SET id = t1.id
FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset
WHERE t1.id = subset.id
RETURNING t1.id

And it does so, most of of the time. But when run repeatedly in a loop like in the attached script, then it will occasionally return 2 rows with two different id values, something the LIMIT 1 should prevent. In my tests it took from anywhere between 0 to 10 minutes and on average 1 to 2 minutes to trigger the problem.

I have reproduced the issue on different machines and platforms with PG 9.3.1, 9.1.10, 9.0.14. (See file).

Interesting, and perhaps telling:
When autovacuum=off in postgresql.conf then I could not trigger the problem.

Oliver

Attachment

pgbug.py

Re: Update with subselect sometimes returns wrong result

From

Oliver Seemann

Date:

30 November 2013, 07:32:51

This looks like the same issue:
http://www.postgresql.org/message-id/E1VN53g-0002Iy-Il@wrigleys.postgresql.org


2013/11/30 Oliver Seemann <oseemann@gmail.com>

> Hi!
>
> Given the following table:
>
> CREATE TABLE t1 (id INTEGER);
> INSERT INTO t1 VALUES (0), (1);
>
> Then the following UPDATE should return exactly one row:
>
> UPDATE t1 SET id = t1.id
> FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset
> WHERE t1.id = subset.id
> RETURNING t1.id
>
> And it does so, most of of the time. But when run repeatedly in a loop
> like in the attached script, then it will occasionally return 2 rows with
> two different id values, something the LIMIT 1 should prevent. In my tests
> it took from anywhere between 0 to 10 minutes and on average 1 to 2 minutes
> to trigger the problem.
>
> I have reproduced the issue on different machines and platforms with PG
> 9.3.1, 9.1.10, 9.0.14. (See file).
>
> Interesting, and perhaps telling:
> When autovacuum=off in postgresql.conf then I could not trigger the
> problem.
>
>
> Oliver
>
>

Re: Update with subselect sometimes returns wrong result

From

Tom Lane

Date:

30 November 2013, 17:57:53

Oliver Seemann <oseemann@gmail.com> writes:
> Given the following table:

> CREATE TABLE t1 (id INTEGER);
> INSERT INTO t1 VALUES (0), (1);

> Then the following UPDATE should return exactly one row:

> UPDATE t1 SET id = t1.id
> FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset
> WHERE t1.id = subset.id
> RETURNING t1.id

> And it does so, most of of the time. But when run repeatedly in a loop like
> in the attached script, then it will occasionally return 2 rows with two
> different id values, something the LIMIT 1 should prevent. In my tests it
> took from anywhere between 0 to 10 minutes and on average 1 to 2 minutes to
> trigger the problem.

I failed to reproduce the claimed misbehavior in git tip of any active
branch.  I'd like to think this means we fixed the problem in the last
two months, but I don't see anything that looks like a promising candidate
in the commit logs.  Perhaps there is some important contributing factor
you've not mentioned --- nondefault postgresql.conf settings, for
instance.

            regards, tom lane

Re: Update with subselect sometimes returns wrong result

From

Marko Tiikkaja

Date:

30 November 2013, 20:42:57

On 11/30/13, 6:57 PM, Tom Lane wrote:
> Oliver Seemann <oseemann@gmail.com> writes:
>> And it does so, most of of the time. But when run repeatedly in a loop like
>> in the attached script, then it will occasionally return 2 rows with two
>> different id values, something the LIMIT 1 should prevent. In my tests it
>> took from anywhere between 0 to 10 minutes and on average 1 to 2 minutes to
>> trigger the problem.
>
> I failed to reproduce the claimed misbehavior in git tip of any active
> branch.  I'd like to think this means we fixed the problem in the last
> two months, but I don't see anything that looks like a promising candidate
> in the commit logs.  Perhaps there is some important contributing factor
> you've not mentioned --- nondefault postgresql.conf settings, for
> instance.

I've managed to reproduce this against REL9_1_STABLE
(4bdccd8427718f9c468e5e03286252f37ea771b5).  I'm on OS X mavericks,
configure line: ./configure --enable-debug --with-openssl --with-perl
--with-libxml --with-readline.  Compiler Apple LLVM version 5.0
(clang-500.2.79) (based on LLVM 3.3svn).  No changes to postgresql.conf.

Does that help?  Do you need some more information?



Regards,
Marko Tiikkaja

Re: Update with subselect sometimes returns wrong result

From

Andres Freund

Date:

30 November 2013, 21:40:25

On 2013-11-30 12:57:44 -0500, Tom Lane wrote:
> Oliver Seemann <oseemann@gmail.com> writes:
> > Given the following table:
>
> > CREATE TABLE t1 (id INTEGER);
> > INSERT INTO t1 VALUES (0), (1);
>
> > Then the following UPDATE should return exactly one row:
>
> > UPDATE t1 SET id = t1.id
> > FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset
> > WHERE t1.id = subset.id
> > RETURNING t1.id
>
> > And it does so, most of of the time. But when run repeatedly in a loop like
> > in the attached script, then it will occasionally return 2 rows with two
> > different id values, something the LIMIT 1 should prevent. In my tests it
> > took from anywhere between 0 to 10 minutes and on average 1 to 2 minutes to
> > trigger the problem.
>
> I failed to reproduce the claimed misbehavior in git tip of any active
> branch.  I'd like to think this means we fixed the problem in the last
> two months, but I don't see anything that looks like a promising candidate
> in the commit logs.  Perhaps there is some important contributing factor
> you've not mentioned --- nondefault postgresql.conf settings, for
> instance.

Looks reproducable here as well, manually executing VACUUMs on the table
greatly speeds things up. Fails within seconds when doing so.


So, it looks like the limit returns more than one row, it's not updating
the same row twice.

Slightly hacked up (probably python 2 only) version of the test script
attached. I'll get to trying to write the release stuff rather then
playing with more interesting things ;)

new row at:  (0,4)
updated row from (0,2) to (0,1) iter 400
deleted row at:  (0,1)
deleted row at:  (0,5)
new row at:  (0,1)
new row at:  (0,5)
updated row from (0,1) to (0,3) iter 401
deleted row at:  (0,2)
deleted row at:  (0,3)
new row at:  (0,2)
new row at:  (0,3)
updated row from (0,1) to (0,3) iter 402
deleted row at:  (0,2)
deleted row at:  (0,3)
new row at:  (0,2)
new row at:  (0,3)
updated row from (0,4) to (0,1) iter 403
deleted row at:  (0,1)
deleted row at:  (0,5)
new row at:  (0,1)
new row at:  (0,5)
updated row from (0,1) to (0,3) iter 404
deleted row at:  (0,2)
deleted row at:  (0,3)
new row at:  (0,2)
new row at:  (0,3)
updated row from (0,1) to (0,3) iter 405
deleted row at:  (0,2)
deleted row at:  (0,3)
new row at:  (0,2)
new row at:  (0,3)
updated row from (0,4) to (0,6) iter 406
...
deleted row at:  (0,2)
deleted row at:  (0,3)
new row at:  (0,2)
new row at:  (0,3)
updated row from (0,4) to (0,1) iter 447
updated row from (0,5) to (0,2) iter 447
Traceback (most recent call last):
  File "/tmp/pgbug.py", line 80, in <module>
    test_bug()
  File "/tmp/pgbug.py", line 51, in test_bug
    update(cur, i)
  File "/tmp/pgbug.py", line 76, in update
    assert(len(rows) == 1)
AssertionError


There's clearly something wrong. (0,4) has been updated several times,
but seems to still be visible.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgbug.py

Re: Update with subselect sometimes returns wrong result

From

David Johnston

Date:

01 December 2013, 03:00:12

Andres Freund-3 wrote
> deleted row at:  (0,1)
> deleted row at:  (0,5)
> new row at:  (0,1)
> new row at:  (0,5)
> updated row from (0,1) to (0,3) iter 401
> deleted row at:  (0,2)
> deleted row at:  (0,3)

Am I reading this right?

Rows == ctid(,#)

Rows 1 & 5 exist; are deleted
Add two new rows, store them at 1 & 5 replacing the just deleted rows
--? does auto-vacuum work that fast that these physical locations are
immediately available...

Update row 1, store new value in 3 marking 1 as deleted (not shown here)
Rows 2?  & 3 exist and are now deleted

?Where did Row 2 come from...was expecting row 5...?


> new row at:  (0,2)
> new row at:  (0,3)
> updated row from (0,1) to (0,3) iter 405
> deleted row at:  (0,2)
> deleted row at:  (0,3)

Likewise: how did Row 1 come into being (so that it could be updated) when
only 2 and 3 were added on the prior iteration?

Then, at the point of the assertion failure, rows 4 & 5 are updated but rows
2 & 3 were the rows that were added...so neither of the just-inserted rows
were returned...

Sorry I cannot actually debug code but figured my observations might be
helpful none-the-less.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Update-with-subselect-sometimes-returns-wrong-result-tp5780925p5781041.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: Update with subselect sometimes returns wrong result

From

Tom Lane

Date:

01 December 2013, 06:41:13

Andres Freund <andres@2ndquadrant.com> writes:
> Slightly hacked up (probably python 2 only) version of the test script
> attached.

Ah, deleting and reinserting the rows each time like that makes it a
lot more reproducible.  I don't have the full story but it's got something
to do with screwed-up tuple info flags.  The plan that's generated looks
like

 Update on t1
   ->  Nested Loop
         Join Filter: (t1.id = subset.id)
         ->  Seq Scan on t1
         ->  Subquery Scan on subset
               ->  Limit
                     ->  LockRows
                           ->  Seq Scan on t1 t1_1

so the subquery scan will be executed twice, once for each row in t1.
The first time through, heap_lock_tuple looks at the first tuple and
locks it.  It doesn't get called on the second tuple because the LIMIT
node stops evaluation.  But in the second scan, heap_lock_tuple returns
HeapTupleSelfUpdated for the first tuple, so ExecLockRows ignores it,
moves on to the second tuple, and returns that.  The LIMIT is happy cause
it just got one row back; it doesn't know it's not the same row as
before.  And this row of course passes the join qual with the second
row from the outer scan of t1, so we perform a second update.

Now, the thing about this is that the tuple heap_lock_tuple is rejecting
in the second pass is the one that we just updated, up at the ModifyTable
plan node.  So I can't find it exactly surprising that it says
HeapTupleSelfUpdated.  But tracing through tqual.c shows that the tuple
has got the HEAP_XMAX_IS_MULTI bit set, which might be thought a bit
peculiar.  There's not multiple transactions involved.

Anyway, at this point I'm not so much wondering why it fails as why it
(seems to) work *any* of the time.  And how is it that VACUUM sometimes
manages to flip it from working state to not-working state?  (Once you're
in the state where the UPDATE will say it updated two rows, it's 100%
reproducible.)

Anyway, it seems pretty clear that the explanation is down somewhere in
the tuple visibility and multixact logic that you and Alvaro have been
hacking on with such vim lately.  I'm out of steam for tonight, over
to you ...

            regards, tom lane

Re: Update with subselect sometimes returns wrong result

From

Tom Lane

Date:

01 December 2013, 07:04:09

I wrote:
> Anyway, at this point I'm not so much wondering why it fails as why it
> (seems to) work *any* of the time.  And how is it that VACUUM sometimes
> manages to flip it from working state to not-working state?  (Once you're
> in the state where the UPDATE will say it updated two rows, it's 100%
> reproducible.)

Oh, hah; the case where it works is where the generated plan is the other
way around:

 Update on t1
   ->  Nested Loop
         Join Filter: (t1.id = subset.id)
         ->  Subquery Scan on subset
               ->  Limit
                     ->  LockRows
                           ->  Seq Scan on t1 t1_1
         ->  Seq Scan on t1

so that we never rescan the LockRows node.  heap_lock_tuple followed by
heap_update works sanely, the other way round not so much.

The apparent dependency on VACUUM is probably coming from updating the
table's relpages/reltuples counts to new values in a way that causes the
planner to think one version or the other is a bit cheaper.

I'd still kind of like to know how HEAP_XMAX_IS_MULTI is getting involved,
but it seems that the fundamental problem here is we haven't thought
through what the interactions of LockRows and ModifyTable operations in
the same query ought to be.

            regards, tom lane

Re: Update with subselect sometimes returns wrong result

From

Andres Freund

Date:

01 December 2013, 08:19:26

On 2013-12-01 01:41:02 -0500, Tom Lane wrote:
> Now, the thing about this is that the tuple heap_lock_tuple is rejecting
> in the second pass is the one that we just updated, up at the ModifyTable
> plan node.  So I can't find it exactly surprising that it says
> HeapTupleSelfUpdated.  But tracing through tqual.c shows that the tuple
> has got the HEAP_XMAX_IS_MULTI bit set, which might be thought a bit
> peculiar.  There's not multiple transactions involved.

Hm. Haven't looked at that, but if so that seems like an independent bug.

> Anyway, it seems pretty clear that the explanation is down somewhere in
> the tuple visibility and multixact logic that you and Alvaro have been
> hacking on with such vim lately.  I'm out of steam for tonight, over
> to you ...

Will look at it for a bit, but I kinda doubt the multixact logic is,
especially anything we've changed, overly much involved. It's
reproducing on 9.1 after all.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Update with subselect sometimes returns wrong result

From

Andres Freund

Date:

01 December 2013, 11:45:25

On 2013-12-01 09:19:19 +0100, Andres Freund wrote:
> On 2013-12-01 01:41:02 -0500, Tom Lane wrote:
> > Now, the thing about this is that the tuple heap_lock_tuple is rejecting
> > in the second pass is the one that we just updated, up at the ModifyTable
> > plan node.  So I can't find it exactly surprising that it says
> > HeapTupleSelfUpdated.  But tracing through tqual.c shows that the tuple
> > has got the HEAP_XMAX_IS_MULTI bit set, which might be thought a bit
> > peculiar.  There's not multiple transactions involved.
>
> Hm. Haven't looked at that, but if so that seems like an independent bug.

Afaics it's a missed optimization, nothing more. There's a branch in
compute_new_xmax_infomask() where it considers xmax == add_to_xmax with
imo somewhat strange conditions for the optimization:

/*
 * If the existing lock mode is identical to or weaker than the new
 * one, we can act as though there is no existing lock, so set
 * XMAX_INVALID and restart.
 */
if (xmax == add_to_xmax)
{
    LockTupleMode old_mode = TUPLOCK_from_mxstatus(status);
    bool        old_isupd = ISUPDATE_from_mxstatus(status);

    /*
     * We can do this if the new LockTupleMode is higher or equal than
     * the old one; and if there was previously an update, we need an
     * update, but if there wasn't, then we can accept there not being
     * one.
     */
    if ((mode >= old_mode) && (is_update || !old_isupd))
    {
        /*
         * Note that the infomask might contain some other dirty bits.
         * However, since the new infomask is reset to zero, we only
         * set what's minimally necessary, and that the case that
         * checks HEAP_XMAX_INVALID is the very first above, there is
         * no need for extra cleanup of the infomask here.
         */
        old_infomask |= HEAP_XMAX_INVALID;
        goto l5;
    }
}

Several things:
a) If the old lockmode is stronger than the new one, we can just promote
   the new one. That's fine.
b) the old xmax cannot be an update, we wouldn't see the row version in that
   case. And in any way, ISUPDATE_from_mxstatus() isn't sufficient to
   determine whether the old row was an update, one needs to look at
   LOCK_ONLY as well, no?
c) Any reason we can't apply this optimization for subtransactions in
   some scenarios?

a), b) are relatively easy. Patch attached. Being a clear regression, I
think it should be backpatched, but I'm not sure if it has to be this
point release. It's simple enough, but ...

The reason we didn't hit the (mode == old_mode && is_update) case above
is that the UPDATE uses only LockTupleNoKeyExclusive since there are no
keys, but FOR UPDATE was used before.

I am not awake enough to think about c) and it seems somewhat
more complicated. I think we can only optimize it if the previous lock
was in a subcommited subtransaction and not if it's a transaction higher
up the chain.

On the note of superflous MultiXactIdCreate() calls, the latter contains
the following assert:
   Assert(!TransactionIdEquals(xid1, xid2) || (status1 != status2));
Is there any reason we *ever* should add two times the same xid? Afaics
there's several code paths to get there today, but imo we should work
towards that never happening.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Avoid-MultiXacts-in-more-situations-when-updating-a-.patch

Re: Update with subselect sometimes returns wrong result

From

Andres Freund

Date:

01 December 2013, 11:53:46

On 2013-12-01 12:45:14 +0100, Andres Freund wrote:
> b) the old xmax cannot be an update, we wouldn't see the row version in that
>    case. And in any way, ISUPDATE_from_mxstatus() isn't sufficient to
>    determine whether the old row was an update, one needs to look at
>    LOCK_ONLY as well, no?

That part's bogus, I missed part of the branch above the quoted code.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Update with subselect sometimes returns wrong result

From

Andres Freund

Date:

01 December 2013, 12:12:14

Hi,

On 2013-11-30 00:08:14 +0100, Oliver Seemann wrote:
> Then the following UPDATE should return exactly one row:
>
> UPDATE t1 SET id = t1.id
> FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset
> WHERE t1.id = subset.id
> RETURNING t1.id

It turns out, this currently (as Tom points out) is a question of how
the query is planned. UPDATEs with a FROM essentially are a join between
the involved tables. Roughly, this query can either be planned as
a) Scan all rows in subset, check whether it matches a row in t1.
or
b) Scan all rows in t1, check for each whether it matches a row in subset.

a) is perfectly fine for what you want, it will only return one row. But
b) is problematic since it will execute the subselect multiple
times, once for each row in t1. "FOR locklevel" currently has the property
of ignoring rows that the current command has modified, so you'll always
get a different row back...

To get rid of that ambiguity, I suggest rewriting the query to look
like:
WITH locked_row AS (
    SELECT id FROM t1 LIMIT 1 FOR UPDATE
)
UPDATE t1 SET id = t1.id
FROM (SELECT * FROM locked_row) locked
WHERE t1.id = locked.id
RETURNING t1.id;

that should always be safe and indeed, I cannot reproduce the problem
that way.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Update with subselect sometimes returns wrong result

From

Andres Freund

Date:

01 December 2013, 12:31:49

On 2013-12-01 02:03:55 -0500, Tom Lane wrote:
> The apparent dependency on VACUUM is probably coming from updating the
> table's relpages/reltuples counts to new values in a way that causes the
> planner to think one version or the other is a bit cheaper.

Hah, didn't realize that for a good bit... Even though I had reproduced
the problem with just concurrently ANALYZEing the table. Things could
have clicked at that point...

> I'd still kind of like to know how HEAP_XMAX_IS_MULTI is getting
> involved,

Hopefully answered nearby.

> but it seems that the fundamental problem here is we haven't thought
> through what the interactions of LockRows and ModifyTable operations in
> the same query ought to be.

I think it's more that we haven't actually thought about the case where
both happen in the same plan at all ;). I think most of that code is
from the time where it was only possible to get there when using UPDATE
... WHERE CURRENT OF cursor_name because FOR .. wasn't allowed in
subselects.

The reason it reproducably fails is:
/*
 * The target tuple was already updated or deleted by the
 * current command, or by a later command in the current
 * transaction.  We *must* ignore the tuple in the former
 * case, so as to avoid the "Halloween problem" of repeated
 * update attempts.  In the latter case it might be sensible
 * to fetch the updated tuple instead, but doing so would
 * require changing heap_lock_tuple as well as heap_update and
 * heap_delete to not complain about updating "invisible"
 * tuples, which seems pretty scary.  So for now, treat the
 * tuple as deleted and do not process.
 */
goto lnext;

in ExecLockRows(), right? Is there actually a real "Halloween problem"
type of situation here?

But either way, even if we would manage to finagle some meaning into
that case, that query would still not be safe in any way, since there's
no determinism in which row the subselect will return.
On a green field, I'd say we should forbid using FOR UPDATE below an
UPDATE/DELETE and just allow combining them via a CTE. But that's
probably hard to do now.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Update with subselect sometimes returns wrong result

From

David Johnston

Date:

01 December 2013, 17:26:08

Andres Freund-3 wrote
> Hi,
>
> On 2013-11-30 00:08:14 +0100, Oliver Seemann wrote:
>> Then the following UPDATE should return exactly one row:
>>
>> UPDATE t1 SET id = t1.id
>> FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset
>> WHERE t1.id = subset.id
>> RETURNING t1.id
>
> It turns out, this currently (as Tom points out) is a question of how
> the query is planned. UPDATEs with a FROM essentially are a join between
> the involved tables. Roughly, this query can either be planned as
> a) Scan all rows in subset, check whether it matches a row in t1.
> or
> b) Scan all rows in t1, check for each whether it matches a row in subset.
>
> a) is perfectly fine for what you want, it will only return one row. But
> b) is problematic since it will execute the subselect multiple
> times, once for each row in t1. "FOR locklevel" currently has the property
> of ignoring rows that the current command has modified, so you'll always
> get a different row back...

From the earlier previously referenced thread it appears a WHERE IN
(sub-select limit for update) clause exhibits the same behavior.

Can we make it so that option B is never considered a valid plan if the join
target has a limit (and/or the for update, depending) applied?

Also, why does B execute the sub-select multiple times?  I would think it
would only do that if the sub-query was correlated. The non-correlated
sub-query should conceptually create a CTE on-the-fly and not require the
caller to do so manually.

While the entire from/where section is inherently correlated due to the
joining the fact that the from's table reference is a query and not a simple
relation means there are effectively two levels to consider.

If you really need option B you can write a correlated exists sub-query
which implies a limit 1 which solves the "sub-query relation is much larger
than the from relation so I want to scan the from relation first"
requirement.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Update-with-subselect-sometimes-returns-wrong-result-tp5780925p5781081.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: Update with subselect sometimes returns wrong result

From

Tom Lane

Date:

01 December 2013, 17:39:39

Andres Freund <andres@2ndquadrant.com> writes:
> The reason it reproducably fails is:
> /*
>  * The target tuple was already updated or deleted by the
>  * current command, or by a later command in the current
>  * transaction.  We *must* ignore the tuple in the former
>  * case, so as to avoid the "Halloween problem" of repeated
>  * update attempts.  In the latter case it might be sensible
>  * to fetch the updated tuple instead, but doing so would
>  * require changing heap_lock_tuple as well as heap_update and
>  * heap_delete to not complain about updating "invisible"
>  * tuples, which seems pretty scary.  So for now, treat the
>  * tuple as deleted and do not process.
>  */
> goto lnext;

> in ExecLockRows(), right? Is there actually a real "Halloween problem"
> type of situation here?

That's the $64 question at this point.  In this example, at least,
it seems like we'd want heap_lock_tuple to say that you *can* lock
a tuple that's already updated by the current command.  But that
seems like a pretty scary solution --- I'm not sure there aren't
other cases where it'd be the wrong thing.

> But either way, even if we would manage to finagle some meaning into
> that case, that query would still not be safe in any way, since there's
> no determinism in which row the subselect will return.

It's indeterminate to start with (and I guess the OP doesn't care).
But once the first execution has chosen some row, what we want is
for a rescan to return the same row as before.  We definitely don't
want heap_lock_tuple to *create* nondetermininism in a scan that
otherwise didn't have any.

Maybe the solution is to retain enough state to be able to tell that
the tuple was locked, then updated, in the current command, and then
return "already locked" in that case.

> On a green field, I'd say we should forbid using FOR UPDATE below an
> UPDATE/DELETE and just allow combining them via a CTE. But that's
> probably hard to do now.

Yeah, I was thinking the same thing.  In any case, sticking the
SELECT FOR UPDATE into a WITH should provide an adequate workaround
for now, at least for cases where the outer UPDATE doesn't ever
try to update rows it's not read from the WITH.  (If it does, then
you have the same nondeterminism about whether the WITH would've
returned those rows if it'd gotten to them first.)

            regards, tom lane

Re: Update with subselect sometimes returns wrong result

From

Oliver Seemann

Date:

02 December 2013, 21:55:39

2013/12/1 Andres Freund <andres@2ndquadrant.com>:
> To get rid of that ambiguity, I suggest rewriting the query to look
> like:
> WITH locked_row AS (
>     SELECT id FROM t1 LIMIT 1 FOR UPDATE
> )
> UPDATE t1 SET id = t1.id
> FROM (SELECT * FROM locked_row) locked
> WHERE t1.id = locked.id
> RETURNING t1.id;

Thanks for looking into this and even providing a workaround!

The patch you posted previously is incomplete, right? Because I can
still trigger the problem with the patch applied on top of git master.
(I use autovacuum_naptime = 1s to reliably trigger within 1-5 seconds).


Oliver

Re: Update with subselect sometimes returns wrong result

From

Andres Freund

Date:

02 December 2013, 22:41:15

On 2013-12-02 22:55:30 +0100, Oliver Seemann wrote:
> 2013/12/1 Andres Freund <andres@2ndquadrant.com>:
> > To get rid of that ambiguity, I suggest rewriting the query to look
> > like:
> > WITH locked_row AS (
> >     SELECT id FROM t1 LIMIT 1 FOR UPDATE
> > )
> > UPDATE t1 SET id = t1.id
> > FROM (SELECT * FROM locked_row) locked
> > WHERE t1.id = locked.id
> > RETURNING t1.id;
>
> Thanks for looking into this and even providing a workaround!
>
> The patch you posted previously is incomplete, right? Because I can
> still trigger the problem with the patch applied on top of git master.
> (I use autovacuum_naptime = 1s to reliably trigger within 1-5 seconds).

The patch isn't for this issue, it's for something Tom noticed while
investigating it. Purely a performance optimization/fix for a
performance regression - albeit a noticeable one.
I'd judge that there's about zero chance that the issue can be fixed in
the stable branches, the likelihood of breaking other working code due
to the require semantic changes are far too great.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Update with subselect sometimes returns wrong result

From

Alvaro Herrera

Date:

18 December 2013, 22:13:55

Andres Freund escribió:

> Several things:
> a) If the old lockmode is stronger than the new one, we can just promote
>    the new one. That's fine.
> b) the old xmax cannot be an update, we wouldn't see the row version in that
>    case. And in any way, ISUPDATE_from_mxstatus() isn't sufficient to
>    determine whether the old row was an update, one needs to look at
>    LOCK_ONLY as well, no?
> c) Any reason we can't apply this optimization for subtransactions in
>    some scenarios?
>
> a), b) are relatively easy. Patch attached. Being a clear regression, I
> think it should be backpatched, but I'm not sure if it has to be this
> point release. It's simple enough, but ...

Nice idea.  I modified the patch slightly, please see attached.

I'm not sure about the added assert that the tuple cannot possibly be
locked.  I fear cursors provide strange ways to access at tuples.  I
haven't been able to reproduce a problem but consider an example such as
the one below.  Is it possible, I wonder, to arrive at the problematic
scenario considering that we might try to traverse an update chain to
lock future versions of the tuple?  I suspect not, because if the tuple
was updated (so that there is an update chain to traverse in the first
place) then we wouldn't be able to update the original anyway.  (I guess
I'm mainly talking to myself to assure me that there's no real problem
here.)

In any case I think it's easy to handle the case by doing something like
    is_update |= ISUPDATE_from_mxstatus(old_status);
and remove the Assert().


alvherre=# create table f (a int primary key, b text);
CREATE TABLE
alvherre=# insert into f values (1, 'one');
INSERT 0 1

alvherre=# begin;
BEGIN
alvherre=# select * from f for update;
 a |   b
---+-------
 1 | three
(1 fila)

alvherre=# declare f cursor for select * from f;
DECLARE CURSOR
alvherre=# fetch 1 from f;
 a |   b
---+-------
 1 | three
(1 fila)

alvherre=# update f set b = 'two';
UPDATE 1
alvherre=# move backward all from f;
MOVE 0
alvherre=# fetch 1 from f;
 a |   b
---+-------
 1 | three
(1 fila)

alvherre=# update f set b = 'four' where current of f;
UPDATE 1
alvherre=# select * from f;
 a |  b
---+------
 1 | four
(1 fila)

alvherre=# commit;

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Optimize-updating-a-row-that-s-locked-by-same-xact.patch

Re: Update with subselect sometimes returns wrong result

From

Andres Freund

Date:

18 December 2013, 22:39:57

On 2013-12-18 19:13:43 -0300, Alvaro Herrera wrote:
> I'm not sure about the added assert that the tuple cannot possibly be
> locked.  I fear cursors provide strange ways to access at tuples.

I don't see how, the EPQ machinery should have ensured we're looking at
the most recent version. Also, pretty fundamentally, we have to be the
only locker, otherwise the optimization wouldn't be applicable in this
way.

> In any case I think it's easy to handle the case by doing something like
>     is_update |= ISUPDATE_from_mxstatus(old_status);
> and remove the Assert().

I think I'd rather have the chance to see the pathway to that, than try
to handle it. I think we have pretty little chance of doing so correctly
if we don't know how it can happen.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Update with subselect sometimes returns wrong result

From

Alvaro Herrera

Date:

19 December 2013, 00:58:19

Andres Freund escribió:
> On 2013-12-18 19:13:43 -0300, Alvaro Herrera wrote:
> > I'm not sure about the added assert that the tuple cannot possibly be
> > locked.  I fear cursors provide strange ways to access at tuples.
>
> I don't see how, the EPQ machinery should have ensured we're looking at
> the most recent version. Also, pretty fundamentally, we have to be the
> only locker, otherwise the optimization wouldn't be applicable in this
> way.

EPQ works funny with cursors in the WHERE CURRENT OF stuff; the fact
that it behaves differently in FOR UPDATE case than when there's no
locking clause makes this whole thing pretty misterious.

Anyway I think this whole optimization can be formulated more clearly if
we separate the case into its own block by checking
XidIsCurrentTransaction instead of cramming it into the XidIsInProgress
case as the original is doing; see attached.

Any ideas on possible tests for this stuff?  Nothing comes to mind that
doesn't involve pageinspect ...

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Optimize-updating-a-row-that-s-locked-by-same-xact.patch

Re: Update with subselect sometimes returns wrong result

From

Alvaro Herrera

Date:

19 December 2013, 15:58:22

Actually, the original coding is better because it enables easier
writing of other optimizations, such as in the attached which should
also cure the performance regression noted in bug #8470.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

multixact-optimize.patch

Re: Update with subselect sometimes returns wrong result

From

Alvaro Herrera

Date:

20 December 2013, 02:45:13

Alvaro Herrera escribió:
> Actually, the original coding is better because it enables easier
> writing of other optimizations, such as in the attached which should
> also cure the performance regression noted in bug #8470.

While trying to apply the second bit of this patch (where we try to skip
acquiring a lock that an ancestor subxact already holds), I noticed that
it doesn't work at all; moreover, the whole idea of locking a tuple
twice by another subaxt of the same transaction doesn't work. For
example:

BEGIN;
SELECT tuple FOR NO KEY UPDATE;
SAVEPOINT f;
SELECT tuple FOR UPDATE;
...

This fails to acquire the second lock completely; only the NO KEY UPDATE
lock is stored in the tuple.  The reason this happens is that
HeapTupleSatisfiesUpdate returns MayBeUpdated if the Xmax is a plain Xid
LOCK_ONLY by our own transaction.  We already commented in some other
thread that maybe we should change this bit of HTSU behavior; but when I
tried to do so to enable this optimization, I found that other routines
die while trying to apply XactLockTableWait to the current transaction.
This sorta makes sense --- it means that if we want to change this, we
will need further surgery to callers of HTSU.

There's another problem which is that this optimization would be
applicable to locks only, not updates.  Given this limitation I think it
would be rather pointless to try to do this.

I will keep working at the other part, involving multixacts.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Update with subselect sometimes returns wrong result

From

Robert Haas

Date:

05 February 2014, 14:22:24

On Thu, Dec 19, 2013 at 9:44 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Alvaro Herrera escribi=F3:
>> Actually, the original coding is better because it enables easier
>> writing of other optimizations, such as in the attached which should
>> also cure the performance regression noted in bug #8470.
>
> While trying to apply the second bit of this patch (where we try to skip
> acquiring a lock that an ancestor subxact already holds), I noticed that
> it doesn't work at all; moreover, the whole idea of locking a tuple
> twice by another subaxt of the same transaction doesn't work. For
> example:
>
> BEGIN;
> SELECT tuple FOR NO KEY UPDATE;
> SAVEPOINT f;
> SELECT tuple FOR UPDATE;
> ...
>
> This fails to acquire the second lock completely; only the NO KEY UPDATE
> lock is stored in the tuple.  The reason this happens is that
> HeapTupleSatisfiesUpdate returns MayBeUpdated if the Xmax is a plain Xid
> LOCK_ONLY by our own transaction.  We already commented in some other
> thread that maybe we should change this bit of HTSU behavior; but when I
> tried to do so to enable this optimization, I found that other routines
> die while trying to apply XactLockTableWait to the current transaction.
> This sorta makes sense --- it means that if we want to change this, we
> will need further surgery to callers of HTSU.
>
> There's another problem which is that this optimization would be
> applicable to locks only, not updates.  Given this limitation I think it
> would be rather pointless to try to do this.
>
> I will keep working at the other part, involving multixacts.

Did anything come of this?

--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Update with subselect sometimes returns wrong result

From

Alvaro Herrera

Date:

05 February 2014, 15:40:58

Robert Haas escribió:
> On Thu, Dec 19, 2013 at 9:44 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > Alvaro Herrera escribió:
> >> Actually, the original coding is better because it enables easier
> >> writing of other optimizations, such as in the attached which should
> >> also cure the performance regression noted in bug #8470.
> >
> > While trying to apply the second bit of this patch (where we try to skip
> > acquiring a lock that an ancestor subxact already holds), I noticed that
> > it doesn't work at all; moreover, the whole idea of locking a tuple
> > twice by another subaxt of the same transaction doesn't work. For
> > example:
> >
> > BEGIN;
> > SELECT tuple FOR NO KEY UPDATE;
> > SAVEPOINT f;
> > SELECT tuple FOR UPDATE;
> > ...
> >
> > This fails to acquire the second lock completely; only the NO KEY UPDATE
> > lock is stored in the tuple.  The reason this happens is that
> > HeapTupleSatisfiesUpdate returns MayBeUpdated if the Xmax is a plain Xid
> > LOCK_ONLY by our own transaction.  We already commented in some other
> > thread that maybe we should change this bit of HTSU behavior; but when I
> > tried to do so to enable this optimization, I found that other routines
> > die while trying to apply XactLockTableWait to the current transaction.
> > This sorta makes sense --- it means that if we want to change this, we
> > will need further surgery to callers of HTSU.
> >
> > There's another problem which is that this optimization would be
> > applicable to locks only, not updates.  Given this limitation I think it
> > would be rather pointless to try to do this.
> >
> > I will keep working at the other part, involving multixacts.
>
> Did anything come of this?

I have paged out the details of all this stuff by now, but (as quoted
above) this is closely related to bug #8470.  I had a patch which was
supposed to fix the performance problem, but at some point I noticed
that there was a serious problem with it, so I put it aside.  (Of
course, there was also the small matter that I needed to focus on other
patches.) Now that I skim that patch again, I *think* there's a portion
of it that should be applied to 9.3 and HEAD.

I see that in bug #8470's thread I didn't post the latest version I had.
This is it (including the serious problem I mentioned above, which is
related to acquiring a lock we already own in a previos subxact, i.e.
more or less exactly the case we're trying to optimize.)

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

optimize-bug-8470.patch