Thread: change in LOCK behavior
Hi, I've just noticed a change of LOCK command behavior between 9.1 and 9.2, and I'm not sure whether this is expected or not. Let's use a very simple table CREATE TABLE x (id INT); Say there are two sessions - A and B, where A performs some operations on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g. it might be a pg_bulkload that acquires such locks, and we need to do that explicitly on one or two places). Session B is attempting to read the data, but is blocked and waits. On 9.1 it sees the commited data (which is what we need) but on 9.2 it sees only data commited at the time of the lock attemt. Example: A: BEGIN; A: LOCK x IN ACCESS EXCLUSIVE MODE; A: INSERT INTO x VALUES (100); B: SELECT * FROM x; A: COMMIT; Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. Is this expected? I suspect the snapshot is read at different time or something, but I've checked release notes but I haven't seen anything relevant. Without getting the commited version of data, the locking is somehow pointless for us (unless using a different lock, not the table itself). regards Tomas
On Wed, Oct 10, 2012 at 10:21:51PM +0200, Tomas Vondra wrote: > Hi, > > I've just noticed a change of LOCK command behavior between 9.1 and 9.2, > and I'm not sure whether this is expected or not. > > Let's use a very simple table > > CREATE TABLE x (id INT); > > Say there are two sessions - A and B, where A performs some operations > on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g. > it might be a pg_bulkload that acquires such locks, and we need to do > that explicitly on one or two places). > > Session B is attempting to read the data, but is blocked and waits. On > 9.1 it sees the commited data (which is what we need) but on 9.2 it sees > only data commited at the time of the lock attemt. > > Example: > > A: BEGIN; > A: LOCK x IN ACCESS EXCLUSIVE MODE; > A: INSERT INTO x VALUES (100); > B: SELECT * FROM x; > A: COMMIT; > > Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. > > Is this expected? I suspect the snapshot is read at different time or > something, but I've checked release notes but I haven't seen anything > relevant. > > Without getting the commited version of data, the locking is somehow > pointless for us (unless using a different lock, not the table itself). > > regards > Tomas > Hi Tomas, 9.2 is doing it right. Per the documentation on explicit locking: http://www.postgresql.org/docs/9.2/static/explicit-locking.html Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR UPDATE/SHARE) statement. Regards, Ken
On 10.10.2012 22:37, ktm@rice.edu wrote: > On Wed, Oct 10, 2012 at 10:21:51PM +0200, Tomas Vondra wrote: >> Example: >> >> A: BEGIN; >> A: LOCK x IN ACCESS EXCLUSIVE MODE; >> A: INSERT INTO x VALUES (100); >> B: SELECT * FROM x; >> A: COMMIT; >> >> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. >> >> Is this expected? I suspect the snapshot is read at different time or >> something, but I've checked release notes but I haven't seen anything >> relevant. >> >> Without getting the commited version of data, the locking is somehow >> pointless for us (unless using a different lock, not the table itself). >> >> regards >> Tomas >> > Hi Tomas, > > 9.2 is doing it right. Per the documentation on explicit locking: > > http://www.postgresql.org/docs/9.2/static/explicit-locking.html > > Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR UPDATE/SHARE) statement. That is not the problem. We do expect it to block (that's why we do this kind of lock in the first place), and that does happen both on 9.1 and 9.2. The difference is that 9.1 does see the changes performed in the other session (that held the lock and released it on commit), while 9.2 does not. Tomas
On Wednesday, October 10, 2012 10:21:51 PM Tomas Vondra wrote: > Hi, > > I've just noticed a change of LOCK command behavior between 9.1 and 9.2, > and I'm not sure whether this is expected or not. > > Let's use a very simple table > > CREATE TABLE x (id INT); > > Say there are two sessions - A and B, where A performs some operations > on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g. > it might be a pg_bulkload that acquires such locks, and we need to do > that explicitly on one or two places). > > Session B is attempting to read the data, but is blocked and waits. On > 9.1 it sees the commited data (which is what we need) but on 9.2 it sees > only data commited at the time of the lock attemt. > > Example: > > A: BEGIN; > A: LOCK x IN ACCESS EXCLUSIVE MODE; > A: INSERT INTO x VALUES (100); > B: SELECT * FROM x; > A: COMMIT; > > Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. > > Is this expected? I suspect the snapshot is read at different time or > something, but I've checked release notes but I haven't seen anything > relevant. > > Without getting the commited version of data, the locking is somehow > pointless for us (unless using a different lock, not the table itself). That sounds like youre using different isolation levels in 9.1 and 9.2. Is that possible? I.e. your 9.1 test uses read committed, and 9.2 uses repeatable read or serializable. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote: > Hi, > > I've just noticed a change of LOCK command behavior between 9.1 and 9.2, > and I'm not sure whether this is expected or not. > > Let's use a very simple table > > CREATE TABLE x (id INT); > > Say there are two sessions - A and B, where A performs some operations > on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g. > it might be a pg_bulkload that acquires such locks, and we need to do > that explicitly on one or two places). > > Session B is attempting to read the data, but is blocked and waits. On > 9.1 it sees the commited data (which is what we need) but on 9.2 it sees > only data commited at the time of the lock attemt. > > Example: > > A: BEGIN; > A: LOCK x IN ACCESS EXCLUSIVE MODE; > A: INSERT INTO x VALUES (100); > B: SELECT * FROM x; > A: COMMIT; > > Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. > > Is this expected? I suspect the snapshot is read at different time or > something, but I've checked release notes but I haven't seen anything > relevant. > > Without getting the commited version of data, the locking is somehow > pointless for us (unless using a different lock, not the table itself). I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php -- Thom
On 10.10.2012 22:42, Andres Freund wrote: > On Wednesday, October 10, 2012 10:21:51 PM Tomas Vondra wrote: >> Hi, >> >> I've just noticed a change of LOCK command behavior between 9.1 and 9.2, >> and I'm not sure whether this is expected or not. >> >> Let's use a very simple table >> >> CREATE TABLE x (id INT); >> >> Say there are two sessions - A and B, where A performs some operations >> on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g. >> it might be a pg_bulkload that acquires such locks, and we need to do >> that explicitly on one or two places). >> >> Session B is attempting to read the data, but is blocked and waits. On >> 9.1 it sees the commited data (which is what we need) but on 9.2 it sees >> only data commited at the time of the lock attemt. >> >> Example: >> >> A: BEGIN; >> A: LOCK x IN ACCESS EXCLUSIVE MODE; >> A: INSERT INTO x VALUES (100); >> B: SELECT * FROM x; >> A: COMMIT; >> >> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. >> >> Is this expected? I suspect the snapshot is read at different time or >> something, but I've checked release notes but I haven't seen anything >> relevant. >> >> Without getting the commited version of data, the locking is somehow >> pointless for us (unless using a different lock, not the table itself). > That sounds like youre using different isolation levels in 9.1 and 9.2. Is that > possible? I.e. your 9.1 test uses read committed, and 9.2 uses repeatable read > or serializable. Nope, it's 'read commited' on both. I haven't touched this, but I've verified it to be sure. ============ 9.1 ============ $ psql testdb psql (9.1.6) Type "help" for help. testdb=# show server_version;server_version ----------------9.1.6 (1 row) testdb=# show transaction_isolation ;transaction_isolation -----------------------read committed (1 row) ============ 9.2 ============ $ psql testdb psql (9.2.0) Type "help" for help. testdb=# show server_version;server_version ----------------9.2.0 (1 row) testdb=# show transaction_isolation testdb-# ;transaction_isolation -----------------------read committed (1 row)
On 10.10.2012 22:43, Thom Brown wrote: > On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote: >> Example: >> >> A: BEGIN; >> A: LOCK x IN ACCESS EXCLUSIVE MODE; >> A: INSERT INTO x VALUES (100); >> B: SELECT * FROM x; >> A: COMMIT; >> >> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. >> >> Is this expected? I suspect the snapshot is read at different time or >> something, but I've checked release notes but I haven't seen anything >> relevant. >> >> Without getting the commited version of data, the locking is somehow >> pointless for us (unless using a different lock, not the table itself). > > I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da > > http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php Maybe, the description suggests it might be related. I'm still not sure whether this is a bug or expected behavior, although the commit clearly states that the change shouldn't be user-visible. Tomas
On Wednesday, October 10, 2012 10:43:57 PM Thom Brown wrote: > On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote: > > Hi, > > > > I've just noticed a change of LOCK command behavior between 9.1 and 9.2, > > and I'm not sure whether this is expected or not. > > > > Let's use a very simple table > > > > CREATE TABLE x (id INT); > > > > Say there are two sessions - A and B, where A performs some operations > > on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g. > > it might be a pg_bulkload that acquires such locks, and we need to do > > that explicitly on one or two places). > > > > Session B is attempting to read the data, but is blocked and waits. On > > 9.1 it sees the commited data (which is what we need) but on 9.2 it sees > > only data commited at the time of the lock attemt. > > > > Example: > > > > A: BEGIN; > > A: LOCK x IN ACCESS EXCLUSIVE MODE; > > A: INSERT INTO x VALUES (100); > > B: SELECT * FROM x; > > A: COMMIT; > > > > Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. > > > > Is this expected? I suspect the snapshot is read at different time or > > something, but I've checked release notes but I haven't seen anything > > relevant. > > > > Without getting the commited version of data, the locking is somehow > > pointless for us (unless using a different lock, not the table itself). > > I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da > > http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php Very likely, yes. In fact you get the same beaviour in 9.1 if you modify the example slightly: B: PREPARE foo AS SELECT * FROM x; A: BEGIN; A: LOCK x IN ACCESS EXCLUSIVE MODE; A: INSERT INTO x VALUES (100); B: EXECUTE foo; A: COMMIT; If you think about it for a second its not that surprising anymore. We start to execute a query, acquire a snapshot for that, and then wait for the locks on the target relations. We continue executing in the same snapshot for the duration of the statement and thus cannot see any of the new rows which committed *after* we assembled our snapshot. The easy workaround is acquiring a AccessShareLock in the B transaction separately. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10.10.2012 23:05, Andres Freund wrote: > On Wednesday, October 10, 2012 10:43:57 PM Thom Brown wrote: >> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote: >>> Hi, >>> >>> I've just noticed a change of LOCK command behavior between 9.1 and 9.2, >>> and I'm not sure whether this is expected or not. >>> >>> Let's use a very simple table >>> >>> CREATE TABLE x (id INT); >>> >>> Say there are two sessions - A and B, where A performs some operations >>> on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g. >>> it might be a pg_bulkload that acquires such locks, and we need to do >>> that explicitly on one or two places). >>> >>> Session B is attempting to read the data, but is blocked and waits. On >>> 9.1 it sees the commited data (which is what we need) but on 9.2 it sees >>> only data commited at the time of the lock attemt. >>> >>> Example: >>> >>> A: BEGIN; >>> A: LOCK x IN ACCESS EXCLUSIVE MODE; >>> A: INSERT INTO x VALUES (100); >>> B: SELECT * FROM x; >>> A: COMMIT; >>> >>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. >>> >>> Is this expected? I suspect the snapshot is read at different time or >>> something, but I've checked release notes but I haven't seen anything >>> relevant. >>> >>> Without getting the commited version of data, the locking is somehow >>> pointless for us (unless using a different lock, not the table itself). >> >> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da >> >> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php > Very likely, yes. In fact you get the same beaviour in 9.1 if you modify the > example slightly: > > B: PREPARE foo AS SELECT * FROM x; > A: BEGIN; > A: LOCK x IN ACCESS EXCLUSIVE MODE; > A: INSERT INTO x VALUES (100); > B: EXECUTE foo; > A: COMMIT; > > If you think about it for a second its not that surprising anymore. We start to > execute a query, acquire a snapshot for that, and then wait for the locks on > the target relations. We continue executing in the same snapshot for the > duration of the statement and thus cannot see any of the new rows which > committed *after* we assembled our snapshot. Yes, that was my guess too (that the snapshot is acquired before asking for the lock and not re-acquired after getting the lock). > The easy workaround is acquiring a AccessShareLock in the B transaction > separately. I know - I've mentioned explicit locking as a possible solution in my first message, although it would make the whole process more complex. The question is whether that should be necessary or whether the 9.2 should behave the same as 9.1. Tomas
On Wednesday, October 10, 2012 11:23:10 PM Tomas Vondra wrote: > On 10.10.2012 23:05, Andres Freund wrote: > > On Wednesday, October 10, 2012 10:43:57 PM Thom Brown wrote: > >> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote: > >>> Hi, > >>> > >>> I've just noticed a change of LOCK command behavior between 9.1 and > >>> 9.2, and I'm not sure whether this is expected or not. > >>> > >>> Let's use a very simple table > >>> > >>> CREATE TABLE x (id INT); > >>> > >>> Say there are two sessions - A and B, where A performs some operations > >>> on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g. > >>> it might be a pg_bulkload that acquires such locks, and we need to do > >>> that explicitly on one or two places). > >>> > >>> Session B is attempting to read the data, but is blocked and waits. On > >>> 9.1 it sees the commited data (which is what we need) but on 9.2 it > >>> sees only data commited at the time of the lock attemt. > >>> > >>> Example: > >>> > >>> A: BEGIN; > >>> A: LOCK x IN ACCESS EXCLUSIVE MODE; > >>> A: INSERT INTO x VALUES (100); > >>> B: SELECT * FROM x; > >>> A: COMMIT; > >>> > >>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. > >>> > >>> Is this expected? I suspect the snapshot is read at different time or > >>> something, but I've checked release notes but I haven't seen anything > >>> relevant. > >>> > >>> Without getting the commited version of data, the locking is somehow > >>> pointless for us (unless using a different lock, not the table itself). > >> > >> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da > >> > >> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php > > > > Very likely, yes. In fact you get the same beaviour in 9.1 if you modify > > the example slightly: > > > > B: PREPARE foo AS SELECT * FROM x; > > A: BEGIN; > > A: LOCK x IN ACCESS EXCLUSIVE MODE; > > A: INSERT INTO x VALUES (100); > > B: EXECUTE foo; > > A: COMMIT; > > > > If you think about it for a second its not that surprising anymore. We > > start to execute a query, acquire a snapshot for that, and then wait for > > the locks on the target relations. We continue executing in the same > > snapshot for the duration of the statement and thus cannot see any of > > the new rows which committed *after* we assembled our snapshot. > > Yes, that was my guess too (that the snapshot is acquired before asking > for the lock and not re-acquired after getting the lock). > > > The easy workaround is acquiring a AccessShareLock in the B transaction > > separately. > > I know - I've mentioned explicit locking as a possible solution in my > first message, although it would make the whole process more complex. I read your original statement as if you would want to use a separate lock (advisory?) which you don't need. > The question is whether that should be necessary or whether the 9.2 > should behave the same as 9.1. Given that 9.1 behaves the same as 9.2 with prepared statements I don't really see a convincing argument for changing this from the status quo. You can hit the same/similar behaviour in 9.1 even if youre not using PREPARE although the window isn't too big and you need DML + only an EXCLUSIVE (not access exlusive) lock for it. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10.10.2012 23:31, Andres Freund wrote: > On Wednesday, October 10, 2012 11:23:10 PM Tomas Vondra wrote: >> On 10.10.2012 23:05, Andres Freund wrote: >>> On Wednesday, October 10, 2012 10:43:57 PM Thom Brown wrote: >>>> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote: >>>>> Hi, >>>>> >>>>> I've just noticed a change of LOCK command behavior between 9.1 and >>>>> 9.2, and I'm not sure whether this is expected or not. >>>>> >>>>> Let's use a very simple table >>>>> >>>>> CREATE TABLE x (id INT); >>>>> >>>>> Say there are two sessions - A and B, where A performs some operations >>>>> on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g. >>>>> it might be a pg_bulkload that acquires such locks, and we need to do >>>>> that explicitly on one or two places). >>>>> >>>>> Session B is attempting to read the data, but is blocked and waits. On >>>>> 9.1 it sees the commited data (which is what we need) but on 9.2 it >>>>> sees only data commited at the time of the lock attemt. >>>>> >>>>> Example: >>>>> >>>>> A: BEGIN; >>>>> A: LOCK x IN ACCESS EXCLUSIVE MODE; >>>>> A: INSERT INTO x VALUES (100); >>>>> B: SELECT * FROM x; >>>>> A: COMMIT; >>>>> >>>>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. >>>>> >>>>> Is this expected? I suspect the snapshot is read at different time or >>>>> something, but I've checked release notes but I haven't seen anything >>>>> relevant. >>>>> >>>>> Without getting the commited version of data, the locking is somehow >>>>> pointless for us (unless using a different lock, not the table itself). >>>> >>>> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da >>>> >>>> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php >>> >>> Very likely, yes. In fact you get the same beaviour in 9.1 if you modify >>> the example slightly: >>> >>> B: PREPARE foo AS SELECT * FROM x; >>> A: BEGIN; >>> A: LOCK x IN ACCESS EXCLUSIVE MODE; >>> A: INSERT INTO x VALUES (100); >>> B: EXECUTE foo; >>> A: COMMIT; >>> >>> If you think about it for a second its not that surprising anymore. We >>> start to execute a query, acquire a snapshot for that, and then wait for >>> the locks on the target relations. We continue executing in the same >>> snapshot for the duration of the statement and thus cannot see any of >>> the new rows which committed *after* we assembled our snapshot. >> >> Yes, that was my guess too (that the snapshot is acquired before asking >> for the lock and not re-acquired after getting the lock). >> >>> The easy workaround is acquiring a AccessShareLock in the B transaction >>> separately. >> >> I know - I've mentioned explicit locking as a possible solution in my >> first message, although it would make the whole process more complex. > I read your original statement as if you would want to use a separate lock > (advisory?) which you don't need. Oh yeah, right. Any lock would work - advisory or not. >> The question is whether that should be necessary or whether the 9.2 >> should behave the same as 9.1. > > Given that 9.1 behaves the same as 9.2 with prepared statements I don't really > see a convincing argument for changing this from the status quo. Well, equally it's not an argument for the 9.2 behavior, I guess. I'm not convinced this is a bug (partly because I haven't found any explicit statement regarding this in the docs), that's why I started this thread instead of spamming pgsql-bugs. For us (our app) this means we'll need to make it a bit more complex, add some more explicit locking that we did not need in 9.1. Acquiring an Access Share lock explicitly feels a bit strange, because that's the lock acquired by SELECT statement anyway. The only difference seems to be that the snapshot is not reacquired after obtaining the lock. Which may or may not be the right thing, depending on the definition of when the query was executed (when asking for the lock or after obtaining it?) Anyway, this seems to me like a behavior change that might bite many others, unknowingly depending on the 9.1-like behavior and I believe it's worth mentioning somewhere - not sure where. > You can hit the same/similar behaviour in 9.1 even if youre not using PREPARE > although the window isn't too big and you need DML + only an EXCLUSIVE (not > access exlusive) lock for it. Probably yes, but we're not doing that so I haven't noticed that. Tomas
On Wednesday, October 10, 2012 11:45:41 PM Tomas Vondra wrote: > On 10.10.2012 23:31, Andres Freund wrote: > > On Wednesday, October 10, 2012 11:23:10 PM Tomas Vondra wrote: > >> On 10.10.2012 23:05, Andres Freund wrote: > >>> On Wednesday, October 10, 2012 10:43:57 PM Thom Brown wrote: > >>>> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote: > >>>>> Hi, > >>>>> > >>>>> I've just noticed a change of LOCK command behavior between 9.1 and > >>>>> 9.2, and I'm not sure whether this is expected or not. > >>>>> > >>>>> Let's use a very simple table > >>>>> > >>>>> CREATE TABLE x (id INT); > >>>>> > >>>>> Say there are two sessions - A and B, where A performs some > >>>>> operations on "x" and needs to protect them with an "ACCESS > >>>>> EXCLUSIVE" lock (e.g. it might be a pg_bulkload that acquires such > >>>>> locks, and we need to do that explicitly on one or two places). > >>>>> > >>>>> Session B is attempting to read the data, but is blocked and waits. > >>>>> On 9.1 it sees the commited data (which is what we need) but on 9.2 > >>>>> it sees only data commited at the time of the lock attemt. > >>>>> > >>>>> Example: > >>>>> > >>>>> A: BEGIN; > >>>>> A: LOCK x IN ACCESS EXCLUSIVE MODE; > >>>>> A: INSERT INTO x VALUES (100); > >>>>> B: SELECT * FROM x; > >>>>> A: COMMIT; > >>>>> > >>>>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. > >>>>> > >>>>> Is this expected? I suspect the snapshot is read at different time or > >>>>> something, but I've checked release notes but I haven't seen anything > >>>>> relevant. > >>>>> > >>>>> Without getting the commited version of data, the locking is somehow > >>>>> pointless for us (unless using a different lock, not the table > >>>>> itself). > >>>> > >>>> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da > >>>> > >>>> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php > >>> > >>> Very likely, yes. In fact you get the same beaviour in 9.1 if you > >>> modify the example slightly: > >>> > >>> B: PREPARE foo AS SELECT * FROM x; > >>> A: BEGIN; > >>> A: LOCK x IN ACCESS EXCLUSIVE MODE; > >>> A: INSERT INTO x VALUES (100); > >>> B: EXECUTE foo; > >>> A: COMMIT; > >>> > >>> If you think about it for a second its not that surprising anymore. We > >>> start to execute a query, acquire a snapshot for that, and then wait > >>> for the locks on the target relations. We continue executing in the > >>> same snapshot for the duration of the statement and thus cannot see > >>> any of the new rows which committed *after* we assembled our snapshot. > >> > >> Yes, that was my guess too (that the snapshot is acquired before asking > >> for the lock and not re-acquired after getting the lock). > >> > >>> The easy workaround is acquiring a AccessShareLock in the B transaction > >>> separately. > >> > >> I know - I've mentioned explicit locking as a possible solution in my > >> first message, although it would make the whole process more complex. > > > > I read your original statement as if you would want to use a separate > > lock (advisory?) which you don't need. > > Oh yeah, right. Any lock would work - advisory or not. Well, it needs to be a lock youre conflicting on, not any lock ;) > >> The question is whether that should be necessary or whether the 9.2 > >> should behave the same as 9.1. > > > > Given that 9.1 behaves the same as 9.2 with prepared statements I don't > > really see a convincing argument for changing this from the status quo. > > Well, equally it's not an argument for the 9.2 behavior, I guess. I'm > not convinced this is a bug (partly because I haven't found any explicit > statement regarding this in the docs), that's why I started this thread > instead of spamming pgsql-bugs. > > For us (our app) this means we'll need to make it a bit more complex, > add some more explicit locking that we did not need in 9.1. Acquiring an > Access Share lock explicitly feels a bit strange, because that's the > lock acquired by SELECT statement anyway. Yea, but its acquired *after* the snapshot is taken. And again, thats what happened in 9.1 as well. Just that *another* snapshot was just for planning the query which by also needs to lock the table in share mode. So after the lock was taken for planning a new snapshot was acquired for execution... Thats not the case anymore in simpler cases. > The only difference seems to be that the snapshot is not reacquired > after obtaining the lock. Which may or may not be the right thing, > depending on the definition of when the query was executed (when asking > for the lock or after obtaining it?) You can't generally reacquire snapshots after waiting for a lock. For one it would be noticeably expensive and for another it would actually result in very strange behaviour in queries with multiple tables. > Anyway, this seems to me like a behavior change that might bite many > others, unknowingly depending on the 9.1-like behavior and I believe > it's worth mentioning somewhere - not sure where. "Locking is not as simple as you (and most of us) thought!" ;) > > You can hit the same/similar behaviour in 9.1 even if youre not using > > PREPARE although the window isn't too big and you need DML + only an > > EXCLUSIVE (not access exlusive) lock for it. > > Probably yes, but we're not doing that so I haven't noticed that. Btw, unrelated to this problem, but why are you access exlusive locking that table? Shouldn't an exlusive lock be enough? Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10.10.2012 23:57, Andres Freund wrote: > On Wednesday, October 10, 2012 11:45:41 PM Tomas Vondra wrote: >> Oh yeah, right. Any lock would work - advisory or not. > Well, it needs to be a lock youre conflicting on, not any lock ;) Oh, yeah, right. I was thinking about acquiring the same advisory lock in both sessions, but somehow I forgot to mention that. >> For us (our app) this means we'll need to make it a bit more complex, >> add some more explicit locking that we did not need in 9.1. Acquiring an >> Access Share lock explicitly feels a bit strange, because that's the >> lock acquired by SELECT statement anyway. > Yea, but its acquired *after* the snapshot is taken. And again, thats what > happened in 9.1 as well. Just that *another* snapshot was just for planning the > query which by also needs to lock the table in share mode. So after the lock > was taken for planning a new snapshot was acquired for execution... Thats not > the case anymore in simpler cases. Yes, exactly what I was suspecting. >> The only difference seems to be that the snapshot is not reacquired >> after obtaining the lock. Which may or may not be the right thing, >> depending on the definition of when the query was executed (when asking >> for the lock or after obtaining it?) > You can't generally reacquire snapshots after waiting for a lock. For one it > would be noticeably expensive and for another it would actually result in very > strange behaviour in queries with multiple tables. Not sure what strange behaviour would that cause (given that 9.1 did that if I understand that correctly). More expensive - no doubt about it, and the commit message mentions that it's an optimization. >> Anyway, this seems to me like a behavior change that might bite many >> others, unknowingly depending on the 9.1-like behavior and I believe >> it's worth mentioning somewhere - not sure where. > "Locking is not as simple as you (and most of us) thought!" ;) I've never thought it's simple, there's a lot of things to deal with. But I was somehow surprised that something that worked fine for quite long time, suddenly broke on 9.2. Yes, it might be a dependency on something that was not really guaranteed. >>> You can hit the same/similar behaviour in 9.1 even if youre not using >>> PREPARE although the window isn't too big and you need DML + only an >>> EXCLUSIVE (not access exlusive) lock for it. >> >> Probably yes, but we're not doing that so I haven't noticed that. > Btw, unrelated to this problem, but why are you access exlusive locking that > table? Shouldn't an exlusive lock be enough? Long story short, we don't, pg_bulkload does. And it's also the reason why we're so unhappy about the change, because the other session is waiting for the load to complete and then it is served with an old snapshot. Tomas
Tomas Vondra <tv@fuzzy.cz> writes: > On 10.10.2012 22:43, Thom Brown wrote: >> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote: >>> A: BEGIN; >>> A: LOCK x IN ACCESS EXCLUSIVE MODE; >>> A: INSERT INTO x VALUES (100); >>> B: SELECT * FROM x; >>> A: COMMIT; >>> >>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows. >>> >>> Is this expected? I suspect the snapshot is read at different time or >>> something, but I've checked release notes but I haven't seen anything >>> relevant. >>> >>> Without getting the commited version of data, the locking is somehow >>> pointless for us (unless using a different lock, not the table itself). >> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da >> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php > Maybe, the description suggests it might be related. I'm still not sure > whether this is a bug or expected behavior, although the commit clearly > states that the change shouldn't be user-visible. Yeah, I think that last is the key point: this patch was sold on the grounds that it wouldn't cause any interesting user-visible behavioral change, but your example blows that claim into tiny little pieces. I'm inclined to think we need to revert this. The performance gain is not worth the prospect of breaking a lot of applications that used to work reliably. Robert? regards, tom lane
On Wed, Oct 10, 2012 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, I think that last is the key point: this patch was sold on the > grounds that it wouldn't cause any interesting user-visible behavioral > change, but your example blows that claim into tiny little pieces. > > I'm inclined to think we need to revert this. The performance gain is > not worth the prospect of breaking a lot of applications that used to > work reliably. Robert? Yeah, I have to admit that I didn't consider the possibility that a lock wait might intervene between the first and second snapshots. But something about this isn't making sense to me. The whole idea of that patch is that we reuse the parse/plan snapshot at execution time. If it's wrong to use a snapshot that might be stale at execution time, then it's equally wrong to use it for parse/plan. Surely, parse/plan also requires a lock before doing anything interesting with the table, so the problem here must be that we're taking the parse/plan snapshot, then getting the lock (after waiting), then doing something with the parse/plan snapshot that fails to notice that the snapshot is stale. However, in this example, the parse/plan stuff we do with the stale snapshot fails to be noticeable, but if we continue on to execution then it is noticeable. But that sounds more like a happy accident than anything else. Andres' example upthread shows that a slightly different test case breaks on 9.1 and 9.2, and I suspect it's possible to construct other examples that behave in surprising ways on 9.1 but *not* on 9.2, by looking for a scenario where the fact that parse/plan uses a different snapshot from execution manifests itself as a user-visible inconsistency. Reverting the patch has the advantage of being simple, not requiring a lot of thought, and restoring the historical behavior, but I'm not that excited about it otherwise. It seems to me that the root of the issue here is that people is not that people expect two snapshots -- indeed, a number of people strongly supported getting rid of that behavior at the time -- but rather that they expect the snapshot to be taken after locks are acquired. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Oct 10, 2012 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, I think that last is the key point: this patch was sold on the >> grounds that it wouldn't cause any interesting user-visible behavioral >> change, but your example blows that claim into tiny little pieces. >> >> I'm inclined to think we need to revert this. The performance gain is >> not worth the prospect of breaking a lot of applications that used to >> work reliably. Robert? > Yeah, I have to admit that I didn't consider the possibility that a > lock wait might intervene between the first and second snapshots. But > something about this isn't making sense to me. It's not hard to see what's happening. The old code was: * take parse/plan snapshot* parse & plan query (includes taking AccessShare lock on tables)* take execution snapshot* runquery Taking the AccessShare lock blocks until the transaction with exclusive lock commits; therefore, the execution snapshot will see that transaction's effects. In the new code, we re-use the parse/plan snapshot for execution, so it predates the other transaction's commit, so we don't see its effects. > The whole idea of that > patch is that we reuse the parse/plan snapshot at execution time. If > it's wrong to use a snapshot that might be stale at execution time, > then it's equally wrong to use it for parse/plan. Nope, because the parse/plan snapshot is only relevant to planner estimation purposes. It doesn't have to be exactly right. (What does have to be exactly right is our idea of the table structure, but that's okay because any catalog consultation we do is with SnapshotNow, and hence will see any DDL the other transaction might have done.) > But that sounds more like a happy accident than anything else. [ shrug... ] Maybe it's a happy accident and maybe somebody designed it with malice aforethought many years ago. But the point is that this type of thing worked, with 100% reliability, before 9.2. Now it does not. I don't think we can accept that. > It seems to me that the > root of the issue here is that people is not that people expect two > snapshots -- indeed, a number of people strongly supported getting rid > of that behavior at the time -- but rather that they expect the > snapshot to be taken after locks are acquired. Sure. Maybe we can rejigger things in a way that does that, although I think the stumbling block is going to be parse-time calls to user-defined I/O functions for constants --- which might need a snapshot. It might be possible to redesign things so that all tables are locked before we do anything that requires a non-SnapshotNow snapshot, and then take a single "planning/execution" snapshot. But that is not this patch, and would be a lot more invasive than this patch, and would certainly not be back-patchable to 9.2. I think we have to revert and go back to the drawing board on this. regards, tom lane
On Wed, Oct 10, 2012 at 08:43:34PM -0400, Tom Lane wrote: > > It seems to me that the > > root of the issue here is that people is not that people expect two > > snapshots -- indeed, a number of people strongly supported getting rid > > of that behavior at the time -- but rather that they expect the > > snapshot to be taken after locks are acquired. > > Sure. Maybe we can rejigger things in a way that does that, although > I think the stumbling block is going to be parse-time calls to > user-defined I/O functions for constants --- which might need a > snapshot. It might be possible to redesign things so that all tables > are locked before we do anything that requires a non-SnapshotNow > snapshot, and then take a single "planning/execution" snapshot. But > that is not this patch, and would be a lot more invasive than this > patch, and would certainly not be back-patchable to 9.2. > > I think we have to revert and go back to the drawing board on this. Is reverting going to adversely affect users who are already using the 9.2 behavior? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > On Wed, Oct 10, 2012 at 08:43:34PM -0400, Tom Lane wrote: >> I think we have to revert and go back to the drawing board on this. > Is reverting going to adversely affect users who are already using the > 9.2 behavior? In what way would somebody be relying on the 9.2 behavior? regards, tom lane
On Wed, Oct 10, 2012 at 09:29:16PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Wed, Oct 10, 2012 at 08:43:34PM -0400, Tom Lane wrote: > >> I think we have to revert and go back to the drawing board on this. > > > Is reverting going to adversely affect users who are already using the > > 9.2 behavior? > > In what way would somebody be relying on the 9.2 behavior? I don't know. I am just asking if an application could be relying on the 9.2 behavior. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian <bruce@momjian.us> writes: > On Wed, Oct 10, 2012 at 09:29:16PM -0400, Tom Lane wrote: >> In what way would somebody be relying on the 9.2 behavior? > I don't know. I am just asking if an application could be relying on > the 9.2 behavior. I don't think so. Robert suggested in the original discussion that there could be cases where users would notice if the plan snapshot was different from the execution snapshot, but on reflection I consider that argument bogus. We have these categories of snapshot-dependent things that happen before execution starts: * System examination of table DDL. This is all done with SnapshotNow and after acquiring a lock on the table, so it's secure. * Evaluation of the input functions for user-defined types and domains (the latter of which can invoke nearly arbitrary code via CHECK constraints). In principle you could imagine that one of these datatypes or domain check constraints involves looking at the tables that the surrounding query is going to touch, but I don't think anybody would consider that good programming style, much less expect that it would necessarily see exactly the same table state as query execution does. * Evaluation of IMMUTABLE functions at plan time. If such a function actually cares exactly which snapshot it runs with, then it's not very immutable, and I feel no compunction about breaking it. * Evaluation of STABLE functions at plan time for estimation purposes. Such a function might well get different answers depending on which snapshot it uses --- but the result is only used for estimation purposes. The worst possible consequence is an inferior plan; there is no correctness issue. So it seems to me to be pretty difficult to credit that any of these things would care at all whether they saw the exact same snapshot that query execution does. It's even less plausible that somebody would have created such a dependency in the short time 9.2 has been out, when such code could not have worked in any prior release. regards, tom lane
On Wed, Oct 10, 2012 at 10:01:30PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Wed, Oct 10, 2012 at 09:29:16PM -0400, Tom Lane wrote: > >> In what way would somebody be relying on the 9.2 behavior? > > > I don't know. I am just asking if an application could be relying on > > the 9.2 behavior. > > I don't think so. Robert suggested in the original discussion that > there could be cases where users would notice if the plan snapshot was > different from the execution snapshot, but on reflection I consider that > argument bogus. We have these categories of snapshot-dependent things > that happen before execution starts: > > * System examination of table DDL. This is all done with SnapshotNow > and after acquiring a lock on the table, so it's secure. > > * Evaluation of the input functions for user-defined types and domains > (the latter of which can invoke nearly arbitrary code via CHECK > constraints). In principle you could imagine that one of these > datatypes or domain check constraints involves looking at the tables > that the surrounding query is going to touch, but I don't think anybody > would consider that good programming style, much less expect that it > would necessarily see exactly the same table state as query execution > does. > > * Evaluation of IMMUTABLE functions at plan time. If such a function > actually cares exactly which snapshot it runs with, then it's not very > immutable, and I feel no compunction about breaking it. > > * Evaluation of STABLE functions at plan time for estimation purposes. > Such a function might well get different answers depending on which > snapshot it uses --- but the result is only used for estimation > purposes. The worst possible consequence is an inferior plan; there is > no correctness issue. > > So it seems to me to be pretty difficult to credit that any of these > things would care at all whether they saw the exact same snapshot that > query execution does. It's even less plausible that somebody would have > created such a dependency in the short time 9.2 has been out, when such > code could not have worked in any prior release. Thanks. You have covered all I was asking. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 11 October 2012 01:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think we have to revert and go back to the drawing board on this. Given that change was also sold on the basis of higher performance, I suggest we retest performance to check there is a gain. If there is still a gain, I suggest we add this as a SIGHUP option, default to off, rather than completely remove it. I might also observe since the problem only happens with lock waits, perhaps we can set a flag can_reuse_snapshot that gets cleared if we need to perform a lock wait before executing the main statement. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 11 October 2012 01:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think we have to revert and go back to the drawing board on this. > Given that change was also sold on the basis of higher performance, I > suggest we retest performance to check there is a gain. If there is > still a gain, I suggest we add this as a SIGHUP option, default to > off, rather than completely remove it. I'm not in favor of adding a GUC for this. The right fix is to redesign the locking/snapshotting process, not expose its warts in bizarre little knobs that make users deal with the tradeoffs. Maybe what we really need is to find a way to make taking a snapshot a lot cheaper, such that the whole need for this patch goes away. We're not going to get far with the idea of making SnapshotNow MVCC-safe unless it becomes a lot cheaper to get an MVCC snapshot. I recall some discussion of trying to reduce a snapshot to a WAL offset --- did that idea crash and burn, or is it still viable? Anyway, I believe that for now we ought to revert and rethink, not look for band-aid ways of preserving this patch. regards, tom lane
On 11 October 2012 17:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 11 October 2012 01:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I think we have to revert and go back to the drawing board on this. > >> Given that change was also sold on the basis of higher performance, I >> suggest we retest performance to check there is a gain. If there is >> still a gain, I suggest we add this as a SIGHUP option, default to >> off, rather than completely remove it. > > I'm not in favor of adding a GUC for this. The right fix is to redesign > the locking/snapshotting process, not expose its warts in bizarre little > knobs that make users deal with the tradeoffs. While I agree with that thought, I'd like to try a little harder than simply revert. > Maybe what we really need is to find a way to make taking a snapshot a > lot cheaper, such that the whole need for this patch goes away. We're > not going to get far with the idea of making SnapshotNow MVCC-safe > unless it becomes a lot cheaper to get an MVCC snapshot. I recall some > discussion of trying to reduce a snapshot to a WAL offset --- did that > idea crash and burn, or is it still viable? I think that is still at the "fond wish" stage and definitely not backpatchable in this universe. > Anyway, I believe that for now we ought to revert and rethink, not look > for band-aid ways of preserving this patch. I suggested a way to automatically trigger a second snapshot. I think that would be acceptable to backpatch. But since I'm leaving this to you, I'll leave that decision to you as well. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > On 11 October 2012 17:53, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Maybe what we really need is to find a way to make taking a snapshot a >> lot cheaper, such that the whole need for this patch goes away. We're >> not going to get far with the idea of making SnapshotNow MVCC-safe >> unless it becomes a lot cheaper to get an MVCC snapshot. I recall some >> discussion of trying to reduce a snapshot to a WAL offset --- did that >> idea crash and burn, or is it still viable? > I think that is still at the "fond wish" stage and definitely not > backpatchable in this universe. Of course not. This performance improvement is simply lost for 9.2. We can try to do better in future releases, but it's a failed experiment for now. Life is hard :-( > I suggested a way to automatically trigger a second snapshot. I think > that would be acceptable to backpatch. If it worked, I might be amenable to that, but it doesn't. You can't trigger taking a new snapshot off whether we waited for a lock; that still has race conditions, just ones that are not so trivial to demonstrate manually. (The other transaction might have committed microseconds before you reach the point of waiting for the lock.) It would have to be a rule like "take a new snapshot if we acquired any new lock since the previous snapshot". While that would work, we'd end up with no performance gain worth mentioning, since there would almost always be some lock acquisitions during parsing. regards, tom lane
On 11 October 2012 18:22, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I suggested a way to automatically trigger a second snapshot. I think >> that would be acceptable to backpatch. > > If it worked, I might be amenable to that, but it doesn't. You can't > trigger taking a new snapshot off whether we waited for a lock; that > still has race conditions, just ones that are not so trivial to > demonstrate manually. (The other transaction might have committed > microseconds before you reach the point of waiting for the lock.) > It would have to be a rule like "take a new snapshot if we acquired > any new lock since the previous snapshot". While that would work, > we'd end up with no performance gain worth mentioning, since there > would almost always be some lock acquisitions during parsing. So where's the race? AFAICS it either waits or it doesn't - the code isn't vague on that point. If we wait we set the flag. The point is that lock waits are pretty rare since most locks are compatible, so triggering a second snap if we waited is not any kind of problem, even if we waited for a very short time. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 11, 2012 at 2:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > So where's the race? > > AFAICS it either waits or it doesn't - the code isn't vague on that > point. If we wait we set the flag. > > The point is that lock waits are pretty rare since most locks are > compatible, so triggering a second snap if we waited is not any kind > of problem, even if we waited for a very short time. That actually wouldn't fix the problem, because we might have this scenario: 1. We take a snapshot. 2. Some other process commits, clearing its XID from its PGPROC and releasing locks. 3. We take a lock. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Simon Riggs <simon@2ndQuadrant.com> writes: > On 11 October 2012 18:22, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If it worked, I might be amenable to that, but it doesn't. You can't >> trigger taking a new snapshot off whether we waited for a lock; that >> still has race conditions, just ones that are not so trivial to >> demonstrate manually. (The other transaction might have committed >> microseconds before you reach the point of waiting for the lock.) > So where's the race? Same example as before, except that the exclusive-lock-holding transaction commits (and releases its lock) between the time that the other transaction takes its parse/plan snapshot and the time that it takes AccessShare lock on the table. regards, tom lane
On 11 October 2012 19:36, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 11, 2012 at 2:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> So where's the race? >> >> AFAICS it either waits or it doesn't - the code isn't vague on that >> point. If we wait we set the flag. >> >> The point is that lock waits are pretty rare since most locks are >> compatible, so triggering a second snap if we waited is not any kind >> of problem, even if we waited for a very short time. > > That actually wouldn't fix the problem, because we might have this scenario: > > 1. We take a snapshot. > 2. Some other process commits, clearing its XID from its PGPROC and > releasing locks. > 3. We take a lock. Hmm, so now the patch author thinks his patch is not just broken with respect to lock waits, but in all cases? Surely the above race condition is obvious, now and before. Why is it suddenly unacceptable? (If you believe that, why on earth did you commit?) Whenever you take a snapshot things can change before you start using it. And as a result all previous releases of Postgres suffer this problem. Ergo, we should defer taking the snapshot until the very last point when we need to use it. Why is that *not* being suggested here? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 11 October 2012 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 11 October 2012 18:22, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> If it worked, I might be amenable to that, but it doesn't. You can't >>> trigger taking a new snapshot off whether we waited for a lock; that >>> still has race conditions, just ones that are not so trivial to >>> demonstrate manually. (The other transaction might have committed >>> microseconds before you reach the point of waiting for the lock.) > >> So where's the race? > > Same example as before, except that the exclusive-lock-holding > transaction commits (and releases its lock) between the time that the > other transaction takes its parse/plan snapshot and the time that it > takes AccessShare lock on the table. A cache invalidation could also set the need-second-snapshot flag. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 11, 2012 at 2:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > Hmm, so now the patch author thinks his patch is not just broken with > respect to lock waits, but in all cases? Surely the above race > condition is obvious, now and before. Why is it suddenly unacceptable? > (If you believe that, why on earth did you commit?) > > Whenever you take a snapshot things can change before you start using > it. And as a result all previous releases of Postgres suffer this > problem. I agree with this. I think that the reversion that Tom is pushing for is fixing a very special case of a more general problem. Now, it might be a sufficiently important special-case that we ought to go and do the revert, but if your argument is that this is an unsatisfying band-aid, I couldn't agree more. It appears to me that this discussion is exposing the fact that we really haven't given much thought to when snapshots ought to be taken or to arranging the order of operations so that they are taken as late as is safely possible. > Ergo, we should defer taking the snapshot until the very last > point when we need to use it. Why is that *not* being suggested here? Well, that's actually not safe either, at least if taken literally. For example, you can't do part of a sequential scan without a snapshot just because all the tuples are frozen, and then take a snapshot when you hit a unfrozen xmin/xmax. Both you and I previously came up with that idea independently, and it would save a lot of cycles, but it's unsafe. Some other backend can update a tuple after you look at it, and then commit. When you get to the new tuple, you'll already have returned the old tuple, and your new snapshot - since it's taken after the commit - will also see the new tuple. So we have to take the snapshot before you begin execution, but it seems that to avoid surprising behavior we also have to take it after acquiring locks. And it looks like locking is intertwined with a bunch of other parse analysis tasks that might require a snapshot to have been taken first. Whee. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11 October 2012 20:25, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Oct 11, 2012 at 2:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Hmm, so now the patch author thinks his patch is not just broken with >> respect to lock waits, but in all cases? Surely the above race >> condition is obvious, now and before. Why is it suddenly unacceptable? >> (If you believe that, why on earth did you commit?) >> >> Whenever you take a snapshot things can change before you start using >> it. And as a result all previous releases of Postgres suffer this >> problem. > > I agree with this. I think that the reversion that Tom is pushing for > is fixing a very special case of a more general problem. Now, it > might be a sufficiently important special-case that we ought to go and > do the revert, but if your argument is that this is an unsatisfying > band-aid, I couldn't agree more. It appears to me that this > discussion is exposing the fact that we really haven't given much > thought to when snapshots ought to be taken or to arranging the order > of operations so that they are taken as late as is safely possible. > >> Ergo, we should defer taking the snapshot until the very last >> point when we need to use it. Why is that *not* being suggested here? > > Well, that's actually not safe either, at least if taken literally. Right - I didn't mean that far - since it was me suggested deferring snapshots previously I'm aware that doesn't work. But immediately before we read the first heap row... since we might need to wait on index access, heap I/O etc. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > So we have to take the snapshot before you begin execution, but it > seems that to avoid surprising behavior we also have to take it after > acquiring locks. And it looks like locking is intertwined with a > bunch of other parse analysis tasks that might require a snapshot to > have been taken first. Whee. Yeah. I think that a good solution to this would involve guaranteeing that the execution snapshot is not taken until we have all locks that are going to be taken on the tables. Which is likely to involve a fair amount of refactoring, though I admit I've not looked at details. In any case, it's a mistake to think about this in isolation. If we're going to do something about redefining SnapshotNow to avoid its race conditions, that's going to move the goalposts quite a lot. Anyway, my feeling about it is that I don't want 9.2 to have an intermediate behavior between the historical one and whatever we end up designing to satisfy these concerns. That's why I'm pressing for reversion and not a band-aid fix in 9.2. I certainly hope we can do better going forward, but this is not looking like whatever we come up with would be sane to back-patch. regards, tom lane
On 11 October 2012 20:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> So we have to take the snapshot before you begin execution, but it >> seems that to avoid surprising behavior we also have to take it after >> acquiring locks. And it looks like locking is intertwined with a >> bunch of other parse analysis tasks that might require a snapshot to >> have been taken first. Whee. > > Yeah. I think that a good solution to this would involve guaranteeing > that the execution snapshot is not taken until we have all locks that > are going to be taken on the tables. Which is likely to involve a fair > amount of refactoring, though I admit I've not looked at details. > > In any case, it's a mistake to think about this in isolation. If we're > going to do something about redefining SnapshotNow to avoid its race > conditions, that's going to move the goalposts quite a lot. > > Anyway, my feeling about it is that I don't want 9.2 to have an > intermediate behavior between the historical one and whatever we end up > designing to satisfy these concerns. That's why I'm pressing for > reversion and not a band-aid fix in 9.2. I certainly hope we can do > better going forward, but this is not looking like whatever we come up > with would be sane to back-patch. Agreed, please revert. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 11, 2012 at 7:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maybe what we really need is to find a way to make taking a snapshot a > lot cheaper, such that the whole need for this patch goes away. We're > not going to get far with the idea of making SnapshotNow MVCC-safe > unless it becomes a lot cheaper to get an MVCC snapshot. I recall some > discussion of trying to reduce a snapshot to a WAL offset --- did that > idea crash and burn, or is it still viable? This was mostly covered in the cheaper snapshots thread. [1] Robert decided to abandon the idea after concluding that the memory overhead was untenable with very old snapshots. [2] I had a really hand-wavy idea of lazily converting snapshots from sequence number based snapshots to traditional list of xids snapshots to limit the overhead. That idea was promptly shot down because in that incarnation it needed snapshots to be stored in shared memory. [3] I have done some more thinking on this topic, although I have to admit that it has been on the backburner. It seems to me that the problems are all surmountable. To recap shortly, the idea is to define visibility and snapshots through commit sequence numbers (LSNs have problems due to async commit). The tricky part is the datastructure to support fast xid-to-csn lookup for visibility checks. To support visibility checks enough information needs to be kept so that the oldest CSN based snapshot can resolve its xmin-xmax range to csns. My idea currently is to have two fixed size shared memory buffers and an overflow log. The first ring buffer is a dense array mapping of xids to csns. The overflow entries from the dense ring buffer are checked if they might be invisible to any CSN based snapshots, and if so inserted into the sparse buffer. The sparse buffer is a sorted array containing xid-csn pairs that are still running or are concurrent with an active CSN based snapshot. Once the sparse buffer is filled up, the smallest xid-csn pairs are evicted to the CSN log. The long running CSN based snapshots then need to read this log to build up the SnapshotData->xip/subxip arrays. The backends can either discover that their snapshots CSNs values have overflowed by checking the appropriate horizon value, or be signaled via an interrupt to enable CSN log cleanup ASAP. I still have to work out some details on how to handle subtransaction overflow, how to maintain reasonably fresh values for different horizons and what are necessary ordering barriers to get lock-free visibility checks. The idea currently seems workable and will make taking snapshots really cheap, while the worst case maintenance overhead is mostly shifted to sessions that acquire lots of writing transactions and hold snapshots open for a long time. If anyone is interested I can do a slightly longer write up detailing what I have worked out so far. Ants Aasma [1] http://archives.postgresql.org/message-id/CA%2BTgmoaAjiq%3Dd%3DkYt3qNj%2BUvi%2BMB-aRovCwr75Ca9egx-Ks9Ag%40mail.gmail.com [2] http://archives.postgresql.org/message-id/CA%2BTgmoYD6EhYy1Rb%2BSEuns5smreY1_3rAMeL%3D76rX8deijy56Q%40mail.gmail.com [3] http://archives.postgresql.org/message-id/CA%2BCSw_uDfg2SBMicGNu13bpr2upbnVL_edoTbzvacR1FrNrZ1g%40mail.gmail.com -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de
Simon Riggs <simon@2ndQuadrant.com> writes: > On 11 October 2012 20:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> So we have to take the snapshot before you begin execution, but it >>> seems that to avoid surprising behavior we also have to take it after >>> acquiring locks. And it looks like locking is intertwined with a >>> bunch of other parse analysis tasks that might require a snapshot to >>> have been taken first. Whee. >> Yeah. I think that a good solution to this would involve guaranteeing >> that the execution snapshot is not taken until we have all locks that >> are going to be taken on the tables. Which is likely to involve a fair >> amount of refactoring, though I admit I've not looked at details. >> >> In any case, it's a mistake to think about this in isolation. If we're >> going to do something about redefining SnapshotNow to avoid its race >> conditions, that's going to move the goalposts quite a lot. >> >> Anyway, my feeling about it is that I don't want 9.2 to have an >> intermediate behavior between the historical one and whatever we end up >> designing to satisfy these concerns. That's why I'm pressing for >> reversion and not a band-aid fix in 9.2. I certainly hope we can do >> better going forward, but this is not looking like whatever we come up >> with would be sane to back-patch. > Agreed, please revert. We have to do something about this one way or another before we can ship 9.2.2. Is the consensus to revert this patch: http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d573e239f03506920938bf0be56c868d9c3416da and if so, who's going to do the deed? regards, tom lane
On Mon, Nov 26, 2012 at 10:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 11 October 2012 20:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> So we have to take the snapshot before you begin execution, but it >>>> seems that to avoid surprising behavior we also have to take it after >>>> acquiring locks. And it looks like locking is intertwined with a >>>> bunch of other parse analysis tasks that might require a snapshot to >>>> have been taken first. Whee. > >>> Yeah. I think that a good solution to this would involve guaranteeing >>> that the execution snapshot is not taken until we have all locks that >>> are going to be taken on the tables. Which is likely to involve a fair >>> amount of refactoring, though I admit I've not looked at details. >>> >>> In any case, it's a mistake to think about this in isolation. If we're >>> going to do something about redefining SnapshotNow to avoid its race >>> conditions, that's going to move the goalposts quite a lot. >>> >>> Anyway, my feeling about it is that I don't want 9.2 to have an >>> intermediate behavior between the historical one and whatever we end up >>> designing to satisfy these concerns. That's why I'm pressing for >>> reversion and not a band-aid fix in 9.2. I certainly hope we can do >>> better going forward, but this is not looking like whatever we come up >>> with would be sane to back-patch. > >> Agreed, please revert. > > We have to do something about this one way or another before we can ship > 9.2.2. Is the consensus to revert this patch: > http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d573e239f03506920938bf0be56c868d9c3416da > and if so, who's going to do the deed? I was assuming you were going to do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Nov 26, 2012 at 10:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We have to do something about this one way or another before we can ship >> 9.2.2. Is the consensus to revert this patch: >> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d573e239f03506920938bf0be56c868d9c3416da >> and if so, who's going to do the deed? > I was assuming you were going to do it. OK, will take care of it. regards, tom lane