Thread: Update with subselect sometimes returns wrong result
Hi!
Given the following table:
CREATE TABLE t1 (id INTEGER);
INSERT INTO t1 VALUES (0), (1);
Then the following UPDATE should return exactly one row:
UPDATE t1 SET id = t1.id
FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset
WHERE t1.id = subset.id
RETURNING t1.id
And it does so, most of of the time. But when run repeatedly in a loop like in the attached script, then it will occasionally return 2 rows with two different id values, something the LIMIT 1 should prevent. In my tests it took from anywhere between 0 to 10 minutes and on average 1 to 2 minutes to trigger the problem.Given the following table:
CREATE TABLE t1 (id INTEGER);
INSERT INTO t1 VALUES (0), (1);
Then the following UPDATE should return exactly one row:
UPDATE t1 SET id = t1.id
FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset
WHERE t1.id = subset.id
RETURNING t1.id
I have reproduced the issue on different machines and platforms with PG 9.3.1, 9.1.10, 9.0.14. (See file).
Interesting, and perhaps telling:
When autovacuum=off in postgresql.conf then I could not trigger the problem.
When autovacuum=off in postgresql.conf then I could not trigger the problem.
Oliver
Attachment
This looks like the same issue: http://www.postgresql.org/message-id/E1VN53g-0002Iy-Il@wrigleys.postgresql.org 2013/11/30 Oliver Seemann <oseemann@gmail.com> > Hi! > > Given the following table: > > CREATE TABLE t1 (id INTEGER); > INSERT INTO t1 VALUES (0), (1); > > Then the following UPDATE should return exactly one row: > > UPDATE t1 SET id = t1.id > FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset > WHERE t1.id = subset.id > RETURNING t1.id > > And it does so, most of of the time. But when run repeatedly in a loop > like in the attached script, then it will occasionally return 2 rows with > two different id values, something the LIMIT 1 should prevent. In my tests > it took from anywhere between 0 to 10 minutes and on average 1 to 2 minutes > to trigger the problem. > > I have reproduced the issue on different machines and platforms with PG > 9.3.1, 9.1.10, 9.0.14. (See file). > > Interesting, and perhaps telling: > When autovacuum=off in postgresql.conf then I could not trigger the > problem. > > > Oliver > >
Oliver Seemann <oseemann@gmail.com> writes: > Given the following table: > CREATE TABLE t1 (id INTEGER); > INSERT INTO t1 VALUES (0), (1); > Then the following UPDATE should return exactly one row: > UPDATE t1 SET id = t1.id > FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset > WHERE t1.id = subset.id > RETURNING t1.id > And it does so, most of of the time. But when run repeatedly in a loop like > in the attached script, then it will occasionally return 2 rows with two > different id values, something the LIMIT 1 should prevent. In my tests it > took from anywhere between 0 to 10 minutes and on average 1 to 2 minutes to > trigger the problem. I failed to reproduce the claimed misbehavior in git tip of any active branch. I'd like to think this means we fixed the problem in the last two months, but I don't see anything that looks like a promising candidate in the commit logs. Perhaps there is some important contributing factor you've not mentioned --- nondefault postgresql.conf settings, for instance. regards, tom lane
On 11/30/13, 6:57 PM, Tom Lane wrote: > Oliver Seemann <oseemann@gmail.com> writes: >> And it does so, most of of the time. But when run repeatedly in a loop like >> in the attached script, then it will occasionally return 2 rows with two >> different id values, something the LIMIT 1 should prevent. In my tests it >> took from anywhere between 0 to 10 minutes and on average 1 to 2 minutes to >> trigger the problem. > > I failed to reproduce the claimed misbehavior in git tip of any active > branch. I'd like to think this means we fixed the problem in the last > two months, but I don't see anything that looks like a promising candidate > in the commit logs. Perhaps there is some important contributing factor > you've not mentioned --- nondefault postgresql.conf settings, for > instance. I've managed to reproduce this against REL9_1_STABLE (4bdccd8427718f9c468e5e03286252f37ea771b5). I'm on OS X mavericks, configure line: ./configure --enable-debug --with-openssl --with-perl --with-libxml --with-readline. Compiler Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn). No changes to postgresql.conf. Does that help? Do you need some more information? Regards, Marko Tiikkaja
On 2013-11-30 12:57:44 -0500, Tom Lane wrote: > Oliver Seemann <oseemann@gmail.com> writes: > > Given the following table: > > > CREATE TABLE t1 (id INTEGER); > > INSERT INTO t1 VALUES (0), (1); > > > Then the following UPDATE should return exactly one row: > > > UPDATE t1 SET id = t1.id > > FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset > > WHERE t1.id = subset.id > > RETURNING t1.id > > > And it does so, most of of the time. But when run repeatedly in a loop like > > in the attached script, then it will occasionally return 2 rows with two > > different id values, something the LIMIT 1 should prevent. In my tests it > > took from anywhere between 0 to 10 minutes and on average 1 to 2 minutes to > > trigger the problem. > > I failed to reproduce the claimed misbehavior in git tip of any active > branch. I'd like to think this means we fixed the problem in the last > two months, but I don't see anything that looks like a promising candidate > in the commit logs. Perhaps there is some important contributing factor > you've not mentioned --- nondefault postgresql.conf settings, for > instance. Looks reproducable here as well, manually executing VACUUMs on the table greatly speeds things up. Fails within seconds when doing so. So, it looks like the limit returns more than one row, it's not updating the same row twice. Slightly hacked up (probably python 2 only) version of the test script attached. I'll get to trying to write the release stuff rather then playing with more interesting things ;) new row at: (0,4) updated row from (0,2) to (0,1) iter 400 deleted row at: (0,1) deleted row at: (0,5) new row at: (0,1) new row at: (0,5) updated row from (0,1) to (0,3) iter 401 deleted row at: (0,2) deleted row at: (0,3) new row at: (0,2) new row at: (0,3) updated row from (0,1) to (0,3) iter 402 deleted row at: (0,2) deleted row at: (0,3) new row at: (0,2) new row at: (0,3) updated row from (0,4) to (0,1) iter 403 deleted row at: (0,1) deleted row at: (0,5) new row at: (0,1) new row at: (0,5) updated row from (0,1) to (0,3) iter 404 deleted row at: (0,2) deleted row at: (0,3) new row at: (0,2) new row at: (0,3) updated row from (0,1) to (0,3) iter 405 deleted row at: (0,2) deleted row at: (0,3) new row at: (0,2) new row at: (0,3) updated row from (0,4) to (0,6) iter 406 ... deleted row at: (0,2) deleted row at: (0,3) new row at: (0,2) new row at: (0,3) updated row from (0,4) to (0,1) iter 447 updated row from (0,5) to (0,2) iter 447 Traceback (most recent call last): File "/tmp/pgbug.py", line 80, in <module> test_bug() File "/tmp/pgbug.py", line 51, in test_bug update(cur, i) File "/tmp/pgbug.py", line 76, in update assert(len(rows) == 1) AssertionError There's clearly something wrong. (0,4) has been updated several times, but seems to still be visible. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Andres Freund-3 wrote > deleted row at: (0,1) > deleted row at: (0,5) > new row at: (0,1) > new row at: (0,5) > updated row from (0,1) to (0,3) iter 401 > deleted row at: (0,2) > deleted row at: (0,3) Am I reading this right? Rows == ctid(,#) Rows 1 & 5 exist; are deleted Add two new rows, store them at 1 & 5 replacing the just deleted rows --? does auto-vacuum work that fast that these physical locations are immediately available... Update row 1, store new value in 3 marking 1 as deleted (not shown here) Rows 2? & 3 exist and are now deleted ?Where did Row 2 come from...was expecting row 5...? > new row at: (0,2) > new row at: (0,3) > updated row from (0,1) to (0,3) iter 405 > deleted row at: (0,2) > deleted row at: (0,3) Likewise: how did Row 1 come into being (so that it could be updated) when only 2 and 3 were added on the prior iteration? Then, at the point of the assertion failure, rows 4 & 5 are updated but rows 2 & 3 were the rows that were added...so neither of the just-inserted rows were returned... Sorry I cannot actually debug code but figured my observations might be helpful none-the-less. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Update-with-subselect-sometimes-returns-wrong-result-tp5780925p5781041.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Andres Freund <andres@2ndquadrant.com> writes: > Slightly hacked up (probably python 2 only) version of the test script > attached. Ah, deleting and reinserting the rows each time like that makes it a lot more reproducible. I don't have the full story but it's got something to do with screwed-up tuple info flags. The plan that's generated looks like Update on t1 -> Nested Loop Join Filter: (t1.id = subset.id) -> Seq Scan on t1 -> Subquery Scan on subset -> Limit -> LockRows -> Seq Scan on t1 t1_1 so the subquery scan will be executed twice, once for each row in t1. The first time through, heap_lock_tuple looks at the first tuple and locks it. It doesn't get called on the second tuple because the LIMIT node stops evaluation. But in the second scan, heap_lock_tuple returns HeapTupleSelfUpdated for the first tuple, so ExecLockRows ignores it, moves on to the second tuple, and returns that. The LIMIT is happy cause it just got one row back; it doesn't know it's not the same row as before. And this row of course passes the join qual with the second row from the outer scan of t1, so we perform a second update. Now, the thing about this is that the tuple heap_lock_tuple is rejecting in the second pass is the one that we just updated, up at the ModifyTable plan node. So I can't find it exactly surprising that it says HeapTupleSelfUpdated. But tracing through tqual.c shows that the tuple has got the HEAP_XMAX_IS_MULTI bit set, which might be thought a bit peculiar. There's not multiple transactions involved. Anyway, at this point I'm not so much wondering why it fails as why it (seems to) work *any* of the time. And how is it that VACUUM sometimes manages to flip it from working state to not-working state? (Once you're in the state where the UPDATE will say it updated two rows, it's 100% reproducible.) Anyway, it seems pretty clear that the explanation is down somewhere in the tuple visibility and multixact logic that you and Alvaro have been hacking on with such vim lately. I'm out of steam for tonight, over to you ... regards, tom lane
I wrote: > Anyway, at this point I'm not so much wondering why it fails as why it > (seems to) work *any* of the time. And how is it that VACUUM sometimes > manages to flip it from working state to not-working state? (Once you're > in the state where the UPDATE will say it updated two rows, it's 100% > reproducible.) Oh, hah; the case where it works is where the generated plan is the other way around: Update on t1 -> Nested Loop Join Filter: (t1.id = subset.id) -> Subquery Scan on subset -> Limit -> LockRows -> Seq Scan on t1 t1_1 -> Seq Scan on t1 so that we never rescan the LockRows node. heap_lock_tuple followed by heap_update works sanely, the other way round not so much. The apparent dependency on VACUUM is probably coming from updating the table's relpages/reltuples counts to new values in a way that causes the planner to think one version or the other is a bit cheaper. I'd still kind of like to know how HEAP_XMAX_IS_MULTI is getting involved, but it seems that the fundamental problem here is we haven't thought through what the interactions of LockRows and ModifyTable operations in the same query ought to be. regards, tom lane
On 2013-12-01 01:41:02 -0500, Tom Lane wrote: > Now, the thing about this is that the tuple heap_lock_tuple is rejecting > in the second pass is the one that we just updated, up at the ModifyTable > plan node. So I can't find it exactly surprising that it says > HeapTupleSelfUpdated. But tracing through tqual.c shows that the tuple > has got the HEAP_XMAX_IS_MULTI bit set, which might be thought a bit > peculiar. There's not multiple transactions involved. Hm. Haven't looked at that, but if so that seems like an independent bug. > Anyway, it seems pretty clear that the explanation is down somewhere in > the tuple visibility and multixact logic that you and Alvaro have been > hacking on with such vim lately. I'm out of steam for tonight, over > to you ... Will look at it for a bit, but I kinda doubt the multixact logic is, especially anything we've changed, overly much involved. It's reproducing on 9.1 after all. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-12-01 09:19:19 +0100, Andres Freund wrote: > On 2013-12-01 01:41:02 -0500, Tom Lane wrote: > > Now, the thing about this is that the tuple heap_lock_tuple is rejecting > > in the second pass is the one that we just updated, up at the ModifyTable > > plan node. So I can't find it exactly surprising that it says > > HeapTupleSelfUpdated. But tracing through tqual.c shows that the tuple > > has got the HEAP_XMAX_IS_MULTI bit set, which might be thought a bit > > peculiar. There's not multiple transactions involved. > > Hm. Haven't looked at that, but if so that seems like an independent bug. Afaics it's a missed optimization, nothing more. There's a branch in compute_new_xmax_infomask() where it considers xmax == add_to_xmax with imo somewhat strange conditions for the optimization: /* * If the existing lock mode is identical to or weaker than the new * one, we can act as though there is no existing lock, so set * XMAX_INVALID and restart. */ if (xmax == add_to_xmax) { LockTupleMode old_mode = TUPLOCK_from_mxstatus(status); bool old_isupd = ISUPDATE_from_mxstatus(status); /* * We can do this if the new LockTupleMode is higher or equal than * the old one; and if there was previously an update, we need an * update, but if there wasn't, then we can accept there not being * one. */ if ((mode >= old_mode) && (is_update || !old_isupd)) { /* * Note that the infomask might contain some other dirty bits. * However, since the new infomask is reset to zero, we only * set what's minimally necessary, and that the case that * checks HEAP_XMAX_INVALID is the very first above, there is * no need for extra cleanup of the infomask here. */ old_infomask |= HEAP_XMAX_INVALID; goto l5; } } Several things: a) If the old lockmode is stronger than the new one, we can just promote the new one. That's fine. b) the old xmax cannot be an update, we wouldn't see the row version in that case. And in any way, ISUPDATE_from_mxstatus() isn't sufficient to determine whether the old row was an update, one needs to look at LOCK_ONLY as well, no? c) Any reason we can't apply this optimization for subtransactions in some scenarios? a), b) are relatively easy. Patch attached. Being a clear regression, I think it should be backpatched, but I'm not sure if it has to be this point release. It's simple enough, but ... The reason we didn't hit the (mode == old_mode && is_update) case above is that the UPDATE uses only LockTupleNoKeyExclusive since there are no keys, but FOR UPDATE was used before. I am not awake enough to think about c) and it seems somewhat more complicated. I think we can only optimize it if the previous lock was in a subcommited subtransaction and not if it's a transaction higher up the chain. On the note of superflous MultiXactIdCreate() calls, the latter contains the following assert: Assert(!TransactionIdEquals(xid1, xid2) || (status1 != status2)); Is there any reason we *ever* should add two times the same xid? Afaics there's several code paths to get there today, but imo we should work towards that never happening. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2013-12-01 12:45:14 +0100, Andres Freund wrote: > b) the old xmax cannot be an update, we wouldn't see the row version in that > case. And in any way, ISUPDATE_from_mxstatus() isn't sufficient to > determine whether the old row was an update, one needs to look at > LOCK_ONLY as well, no? That part's bogus, I missed part of the branch above the quoted code. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Hi, On 2013-11-30 00:08:14 +0100, Oliver Seemann wrote: > Then the following UPDATE should return exactly one row: > > UPDATE t1 SET id = t1.id > FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset > WHERE t1.id = subset.id > RETURNING t1.id It turns out, this currently (as Tom points out) is a question of how the query is planned. UPDATEs with a FROM essentially are a join between the involved tables. Roughly, this query can either be planned as a) Scan all rows in subset, check whether it matches a row in t1. or b) Scan all rows in t1, check for each whether it matches a row in subset. a) is perfectly fine for what you want, it will only return one row. But b) is problematic since it will execute the subselect multiple times, once for each row in t1. "FOR locklevel" currently has the property of ignoring rows that the current command has modified, so you'll always get a different row back... To get rid of that ambiguity, I suggest rewriting the query to look like: WITH locked_row AS ( SELECT id FROM t1 LIMIT 1 FOR UPDATE ) UPDATE t1 SET id = t1.id FROM (SELECT * FROM locked_row) locked WHERE t1.id = locked.id RETURNING t1.id; that should always be safe and indeed, I cannot reproduce the problem that way. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2013-12-01 02:03:55 -0500, Tom Lane wrote: > The apparent dependency on VACUUM is probably coming from updating the > table's relpages/reltuples counts to new values in a way that causes the > planner to think one version or the other is a bit cheaper. Hah, didn't realize that for a good bit... Even though I had reproduced the problem with just concurrently ANALYZEing the table. Things could have clicked at that point... > I'd still kind of like to know how HEAP_XMAX_IS_MULTI is getting > involved, Hopefully answered nearby. > but it seems that the fundamental problem here is we haven't thought > through what the interactions of LockRows and ModifyTable operations in > the same query ought to be. I think it's more that we haven't actually thought about the case where both happen in the same plan at all ;). I think most of that code is from the time where it was only possible to get there when using UPDATE ... WHERE CURRENT OF cursor_name because FOR .. wasn't allowed in subselects. The reason it reproducably fails is: /* * The target tuple was already updated or deleted by the * current command, or by a later command in the current * transaction. We *must* ignore the tuple in the former * case, so as to avoid the "Halloween problem" of repeated * update attempts. In the latter case it might be sensible * to fetch the updated tuple instead, but doing so would * require changing heap_lock_tuple as well as heap_update and * heap_delete to not complain about updating "invisible" * tuples, which seems pretty scary. So for now, treat the * tuple as deleted and do not process. */ goto lnext; in ExecLockRows(), right? Is there actually a real "Halloween problem" type of situation here? But either way, even if we would manage to finagle some meaning into that case, that query would still not be safe in any way, since there's no determinism in which row the subselect will return. On a green field, I'd say we should forbid using FOR UPDATE below an UPDATE/DELETE and just allow combining them via a CTE. But that's probably hard to do now. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund-3 wrote > Hi, > > On 2013-11-30 00:08:14 +0100, Oliver Seemann wrote: >> Then the following UPDATE should return exactly one row: >> >> UPDATE t1 SET id = t1.id >> FROM (SELECT id FROM t1 LIMIT 1 FOR UPDATE) AS subset >> WHERE t1.id = subset.id >> RETURNING t1.id > > It turns out, this currently (as Tom points out) is a question of how > the query is planned. UPDATEs with a FROM essentially are a join between > the involved tables. Roughly, this query can either be planned as > a) Scan all rows in subset, check whether it matches a row in t1. > or > b) Scan all rows in t1, check for each whether it matches a row in subset. > > a) is perfectly fine for what you want, it will only return one row. But > b) is problematic since it will execute the subselect multiple > times, once for each row in t1. "FOR locklevel" currently has the property > of ignoring rows that the current command has modified, so you'll always > get a different row back... From the earlier previously referenced thread it appears a WHERE IN (sub-select limit for update) clause exhibits the same behavior. Can we make it so that option B is never considered a valid plan if the join target has a limit (and/or the for update, depending) applied? Also, why does B execute the sub-select multiple times? I would think it would only do that if the sub-query was correlated. The non-correlated sub-query should conceptually create a CTE on-the-fly and not require the caller to do so manually. While the entire from/where section is inherently correlated due to the joining the fact that the from's table reference is a query and not a simple relation means there are effectively two levels to consider. If you really need option B you can write a correlated exists sub-query which implies a limit 1 which solves the "sub-query relation is much larger than the from relation so I want to scan the from relation first" requirement. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Update-with-subselect-sometimes-returns-wrong-result-tp5780925p5781081.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
Andres Freund <andres@2ndquadrant.com> writes: > The reason it reproducably fails is: > /* > * The target tuple was already updated or deleted by the > * current command, or by a later command in the current > * transaction. We *must* ignore the tuple in the former > * case, so as to avoid the "Halloween problem" of repeated > * update attempts. In the latter case it might be sensible > * to fetch the updated tuple instead, but doing so would > * require changing heap_lock_tuple as well as heap_update and > * heap_delete to not complain about updating "invisible" > * tuples, which seems pretty scary. So for now, treat the > * tuple as deleted and do not process. > */ > goto lnext; > in ExecLockRows(), right? Is there actually a real "Halloween problem" > type of situation here? That's the $64 question at this point. In this example, at least, it seems like we'd want heap_lock_tuple to say that you *can* lock a tuple that's already updated by the current command. But that seems like a pretty scary solution --- I'm not sure there aren't other cases where it'd be the wrong thing. > But either way, even if we would manage to finagle some meaning into > that case, that query would still not be safe in any way, since there's > no determinism in which row the subselect will return. It's indeterminate to start with (and I guess the OP doesn't care). But once the first execution has chosen some row, what we want is for a rescan to return the same row as before. We definitely don't want heap_lock_tuple to *create* nondetermininism in a scan that otherwise didn't have any. Maybe the solution is to retain enough state to be able to tell that the tuple was locked, then updated, in the current command, and then return "already locked" in that case. > On a green field, I'd say we should forbid using FOR UPDATE below an > UPDATE/DELETE and just allow combining them via a CTE. But that's > probably hard to do now. Yeah, I was thinking the same thing. In any case, sticking the SELECT FOR UPDATE into a WITH should provide an adequate workaround for now, at least for cases where the outer UPDATE doesn't ever try to update rows it's not read from the WITH. (If it does, then you have the same nondeterminism about whether the WITH would've returned those rows if it'd gotten to them first.) regards, tom lane
2013/12/1 Andres Freund <andres@2ndquadrant.com>: > To get rid of that ambiguity, I suggest rewriting the query to look > like: > WITH locked_row AS ( > SELECT id FROM t1 LIMIT 1 FOR UPDATE > ) > UPDATE t1 SET id = t1.id > FROM (SELECT * FROM locked_row) locked > WHERE t1.id = locked.id > RETURNING t1.id; Thanks for looking into this and even providing a workaround! The patch you posted previously is incomplete, right? Because I can still trigger the problem with the patch applied on top of git master. (I use autovacuum_naptime = 1s to reliably trigger within 1-5 seconds). Oliver
On 2013-12-02 22:55:30 +0100, Oliver Seemann wrote: > 2013/12/1 Andres Freund <andres@2ndquadrant.com>: > > To get rid of that ambiguity, I suggest rewriting the query to look > > like: > > WITH locked_row AS ( > > SELECT id FROM t1 LIMIT 1 FOR UPDATE > > ) > > UPDATE t1 SET id = t1.id > > FROM (SELECT * FROM locked_row) locked > > WHERE t1.id = locked.id > > RETURNING t1.id; > > Thanks for looking into this and even providing a workaround! > > The patch you posted previously is incomplete, right? Because I can > still trigger the problem with the patch applied on top of git master. > (I use autovacuum_naptime = 1s to reliably trigger within 1-5 seconds). The patch isn't for this issue, it's for something Tom noticed while investigating it. Purely a performance optimization/fix for a performance regression - albeit a noticeable one. I'd judge that there's about zero chance that the issue can be fixed in the stable branches, the likelihood of breaking other working code due to the require semantic changes are far too great. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund escribió: > Several things: > a) If the old lockmode is stronger than the new one, we can just promote > the new one. That's fine. > b) the old xmax cannot be an update, we wouldn't see the row version in that > case. And in any way, ISUPDATE_from_mxstatus() isn't sufficient to > determine whether the old row was an update, one needs to look at > LOCK_ONLY as well, no? > c) Any reason we can't apply this optimization for subtransactions in > some scenarios? > > a), b) are relatively easy. Patch attached. Being a clear regression, I > think it should be backpatched, but I'm not sure if it has to be this > point release. It's simple enough, but ... Nice idea. I modified the patch slightly, please see attached. I'm not sure about the added assert that the tuple cannot possibly be locked. I fear cursors provide strange ways to access at tuples. I haven't been able to reproduce a problem but consider an example such as the one below. Is it possible, I wonder, to arrive at the problematic scenario considering that we might try to traverse an update chain to lock future versions of the tuple? I suspect not, because if the tuple was updated (so that there is an update chain to traverse in the first place) then we wouldn't be able to update the original anyway. (I guess I'm mainly talking to myself to assure me that there's no real problem here.) In any case I think it's easy to handle the case by doing something like is_update |= ISUPDATE_from_mxstatus(old_status); and remove the Assert(). alvherre=# create table f (a int primary key, b text); CREATE TABLE alvherre=# insert into f values (1, 'one'); INSERT 0 1 alvherre=# begin; BEGIN alvherre=# select * from f for update; a | b ---+------- 1 | three (1 fila) alvherre=# declare f cursor for select * from f; DECLARE CURSOR alvherre=# fetch 1 from f; a | b ---+------- 1 | three (1 fila) alvherre=# update f set b = 'two'; UPDATE 1 alvherre=# move backward all from f; MOVE 0 alvherre=# fetch 1 from f; a | b ---+------- 1 | three (1 fila) alvherre=# update f set b = 'four' where current of f; UPDATE 1 alvherre=# select * from f; a | b ---+------ 1 | four (1 fila) alvherre=# commit; -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 2013-12-18 19:13:43 -0300, Alvaro Herrera wrote: > I'm not sure about the added assert that the tuple cannot possibly be > locked. I fear cursors provide strange ways to access at tuples. I don't see how, the EPQ machinery should have ensured we're looking at the most recent version. Also, pretty fundamentally, we have to be the only locker, otherwise the optimization wouldn't be applicable in this way. > In any case I think it's easy to handle the case by doing something like > is_update |= ISUPDATE_from_mxstatus(old_status); > and remove the Assert(). I think I'd rather have the chance to see the pathway to that, than try to handle it. I think we have pretty little chance of doing so correctly if we don't know how it can happen. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund escribió: > On 2013-12-18 19:13:43 -0300, Alvaro Herrera wrote: > > I'm not sure about the added assert that the tuple cannot possibly be > > locked. I fear cursors provide strange ways to access at tuples. > > I don't see how, the EPQ machinery should have ensured we're looking at > the most recent version. Also, pretty fundamentally, we have to be the > only locker, otherwise the optimization wouldn't be applicable in this > way. EPQ works funny with cursors in the WHERE CURRENT OF stuff; the fact that it behaves differently in FOR UPDATE case than when there's no locking clause makes this whole thing pretty misterious. Anyway I think this whole optimization can be formulated more clearly if we separate the case into its own block by checking XidIsCurrentTransaction instead of cramming it into the XidIsInProgress case as the original is doing; see attached. Any ideas on possible tests for this stuff? Nothing comes to mind that doesn't involve pageinspect ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Actually, the original coding is better because it enables easier writing of other optimizations, such as in the attached which should also cure the performance regression noted in bug #8470. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Alvaro Herrera escribió: > Actually, the original coding is better because it enables easier > writing of other optimizations, such as in the attached which should > also cure the performance regression noted in bug #8470. While trying to apply the second bit of this patch (where we try to skip acquiring a lock that an ancestor subxact already holds), I noticed that it doesn't work at all; moreover, the whole idea of locking a tuple twice by another subaxt of the same transaction doesn't work. For example: BEGIN; SELECT tuple FOR NO KEY UPDATE; SAVEPOINT f; SELECT tuple FOR UPDATE; ... This fails to acquire the second lock completely; only the NO KEY UPDATE lock is stored in the tuple. The reason this happens is that HeapTupleSatisfiesUpdate returns MayBeUpdated if the Xmax is a plain Xid LOCK_ONLY by our own transaction. We already commented in some other thread that maybe we should change this bit of HTSU behavior; but when I tried to do so to enable this optimization, I found that other routines die while trying to apply XactLockTableWait to the current transaction. This sorta makes sense --- it means that if we want to change this, we will need further surgery to callers of HTSU. There's another problem which is that this optimization would be applicable to locks only, not updates. Given this limitation I think it would be rather pointless to try to do this. I will keep working at the other part, involving multixacts. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Dec 19, 2013 at 9:44 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Alvaro Herrera escribi=F3: >> Actually, the original coding is better because it enables easier >> writing of other optimizations, such as in the attached which should >> also cure the performance regression noted in bug #8470. > > While trying to apply the second bit of this patch (where we try to skip > acquiring a lock that an ancestor subxact already holds), I noticed that > it doesn't work at all; moreover, the whole idea of locking a tuple > twice by another subaxt of the same transaction doesn't work. For > example: > > BEGIN; > SELECT tuple FOR NO KEY UPDATE; > SAVEPOINT f; > SELECT tuple FOR UPDATE; > ... > > This fails to acquire the second lock completely; only the NO KEY UPDATE > lock is stored in the tuple. The reason this happens is that > HeapTupleSatisfiesUpdate returns MayBeUpdated if the Xmax is a plain Xid > LOCK_ONLY by our own transaction. We already commented in some other > thread that maybe we should change this bit of HTSU behavior; but when I > tried to do so to enable this optimization, I found that other routines > die while trying to apply XactLockTableWait to the current transaction. > This sorta makes sense --- it means that if we want to change this, we > will need further surgery to callers of HTSU. > > There's another problem which is that this optimization would be > applicable to locks only, not updates. Given this limitation I think it > would be rather pointless to try to do this. > > I will keep working at the other part, involving multixacts. Did anything come of this? --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > On Thu, Dec 19, 2013 at 9:44 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > Alvaro Herrera escribió: > >> Actually, the original coding is better because it enables easier > >> writing of other optimizations, such as in the attached which should > >> also cure the performance regression noted in bug #8470. > > > > While trying to apply the second bit of this patch (where we try to skip > > acquiring a lock that an ancestor subxact already holds), I noticed that > > it doesn't work at all; moreover, the whole idea of locking a tuple > > twice by another subaxt of the same transaction doesn't work. For > > example: > > > > BEGIN; > > SELECT tuple FOR NO KEY UPDATE; > > SAVEPOINT f; > > SELECT tuple FOR UPDATE; > > ... > > > > This fails to acquire the second lock completely; only the NO KEY UPDATE > > lock is stored in the tuple. The reason this happens is that > > HeapTupleSatisfiesUpdate returns MayBeUpdated if the Xmax is a plain Xid > > LOCK_ONLY by our own transaction. We already commented in some other > > thread that maybe we should change this bit of HTSU behavior; but when I > > tried to do so to enable this optimization, I found that other routines > > die while trying to apply XactLockTableWait to the current transaction. > > This sorta makes sense --- it means that if we want to change this, we > > will need further surgery to callers of HTSU. > > > > There's another problem which is that this optimization would be > > applicable to locks only, not updates. Given this limitation I think it > > would be rather pointless to try to do this. > > > > I will keep working at the other part, involving multixacts. > > Did anything come of this? I have paged out the details of all this stuff by now, but (as quoted above) this is closely related to bug #8470. I had a patch which was supposed to fix the performance problem, but at some point I noticed that there was a serious problem with it, so I put it aside. (Of course, there was also the small matter that I needed to focus on other patches.) Now that I skim that patch again, I *think* there's a portion of it that should be applied to 9.3 and HEAD. I see that in bug #8470's thread I didn't post the latest version I had. This is it (including the serious problem I mentioned above, which is related to acquiring a lock we already own in a previos subxact, i.e. more or less exactly the case we're trying to optimize.) -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services