Thread: change in LOCK behavior

change in LOCK behavior

From

Tomas Vondra

Date:

10 October 2012, 20:21:53

Hi,

I've just noticed a change of LOCK command behavior between 9.1 and 9.2,
and I'm not sure whether this is expected or not.

Let's use a very simple table
 CREATE TABLE x (id INT);

Say there are two sessions - A and B, where A performs some operations
on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g.
it might be a pg_bulkload that acquires such locks, and we need to do
that explicitly on one or two places).

Session B is attempting to read the data, but is blocked and waits. On
9.1 it sees the commited data (which is what we need) but on 9.2 it sees
only data commited at the time of the lock attemt.

Example:

A: BEGIN;
A: LOCK x IN ACCESS EXCLUSIVE MODE;
A: INSERT INTO x VALUES (100);
B: SELECT * FROM x;
A: COMMIT;

Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.

Is this expected? I suspect the snapshot is read at different time or
something, but I've checked release notes but I haven't seen anything
relevant.

Without getting the commited version of data, the locking is somehow
pointless for us (unless using a different lock, not the table itself).

regards
Tomas

Re: change in LOCK behavior

From

"ktm@rice.edu"

Date:

10 October 2012, 20:37:08

On Wed, Oct 10, 2012 at 10:21:51PM +0200, Tomas Vondra wrote:
> Hi,
> 
> I've just noticed a change of LOCK command behavior between 9.1 and 9.2,
> and I'm not sure whether this is expected or not.
> 
> Let's use a very simple table
> 
>   CREATE TABLE x (id INT);
> 
> Say there are two sessions - A and B, where A performs some operations
> on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g.
> it might be a pg_bulkload that acquires such locks, and we need to do
> that explicitly on one or two places).
> 
> Session B is attempting to read the data, but is blocked and waits. On
> 9.1 it sees the commited data (which is what we need) but on 9.2 it sees
> only data commited at the time of the lock attemt.
> 
> Example:
> 
> A: BEGIN;
> A: LOCK x IN ACCESS EXCLUSIVE MODE;
> A: INSERT INTO x VALUES (100);
> B: SELECT * FROM x;
> A: COMMIT;
> 
> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
> 
> Is this expected? I suspect the snapshot is read at different time or
> something, but I've checked release notes but I haven't seen anything
> relevant.
> 
> Without getting the commited version of data, the locking is somehow
> pointless for us (unless using a different lock, not the table itself).
> 
> regards
> Tomas
> 
Hi Tomas,

9.2 is doing it right. Per the documentation on explicit locking:

http://www.postgresql.org/docs/9.2/static/explicit-locking.html

Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR UPDATE/SHARE) statement.

Regards,
Ken

Re: change in LOCK behavior

From

Tomas Vondra

Date:

10 October 2012, 20:40:15

On 10.10.2012 22:37, ktm@rice.edu wrote:
> On Wed, Oct 10, 2012 at 10:21:51PM +0200, Tomas Vondra wrote:
>> Example:
>>
>> A: BEGIN;
>> A: LOCK x IN ACCESS EXCLUSIVE MODE;
>> A: INSERT INTO x VALUES (100);
>> B: SELECT * FROM x;
>> A: COMMIT;
>>
>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
>>
>> Is this expected? I suspect the snapshot is read at different time or
>> something, but I've checked release notes but I haven't seen anything
>> relevant.
>>
>> Without getting the commited version of data, the locking is somehow
>> pointless for us (unless using a different lock, not the table itself).
>>
>> regards
>> Tomas
>>
> Hi Tomas,
> 
> 9.2 is doing it right. Per the documentation on explicit locking:
> 
> http://www.postgresql.org/docs/9.2/static/explicit-locking.html
> 
> Tip: Only an ACCESS EXCLUSIVE lock blocks a SELECT (without FOR UPDATE/SHARE) statement.

That is not the problem. We do expect it to block (that's why we do this
kind of lock in the first place), and that does happen both on 9.1 and 9.2.

The difference is that 9.1 does see the changes performed in the other
session (that held the lock and released it on commit), while 9.2 does not.

Tomas

Re: change in LOCK behavior

From

Andres Freund

Date:

10 October 2012, 20:42:22

On Wednesday, October 10, 2012 10:21:51 PM Tomas Vondra wrote:
> Hi,
> 
> I've just noticed a change of LOCK command behavior between 9.1 and 9.2,
> and I'm not sure whether this is expected or not.
> 
> Let's use a very simple table
> 
>   CREATE TABLE x (id INT);
> 
> Say there are two sessions - A and B, where A performs some operations
> on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g.
> it might be a pg_bulkload that acquires such locks, and we need to do
> that explicitly on one or two places).
> 
> Session B is attempting to read the data, but is blocked and waits. On
> 9.1 it sees the commited data (which is what we need) but on 9.2 it sees
> only data commited at the time of the lock attemt.
> 
> Example:
> 
> A: BEGIN;
> A: LOCK x IN ACCESS EXCLUSIVE MODE;
> A: INSERT INTO x VALUES (100);
> B: SELECT * FROM x;
> A: COMMIT;
> 
> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
> 
> Is this expected? I suspect the snapshot is read at different time or
> something, but I've checked release notes but I haven't seen anything
> relevant.
> 
> Without getting the commited version of data, the locking is somehow
> pointless for us (unless using a different lock, not the table itself).
That sounds like youre using different isolation levels in 9.1 and 9.2. Is that 
possible? I.e. your 9.1 test uses read committed, and 9.2 uses repeatable read 
or serializable.

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: change in LOCK behavior

From

Thom Brown

Date:

10 October 2012, 20:44:30

On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote:
> Hi,
>
> I've just noticed a change of LOCK command behavior between 9.1 and 9.2,
> and I'm not sure whether this is expected or not.
>
> Let's use a very simple table
>
>   CREATE TABLE x (id INT);
>
> Say there are two sessions - A and B, where A performs some operations
> on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g.
> it might be a pg_bulkload that acquires such locks, and we need to do
> that explicitly on one or two places).
>
> Session B is attempting to read the data, but is blocked and waits. On
> 9.1 it sees the commited data (which is what we need) but on 9.2 it sees
> only data commited at the time of the lock attemt.
>
> Example:
>
> A: BEGIN;
> A: LOCK x IN ACCESS EXCLUSIVE MODE;
> A: INSERT INTO x VALUES (100);
> B: SELECT * FROM x;
> A: COMMIT;
>
> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
>
> Is this expected? I suspect the snapshot is read at different time or
> something, but I've checked release notes but I haven't seen anything
> relevant.
>
> Without getting the commited version of data, the locking is somehow
> pointless for us (unless using a different lock, not the table itself).

I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da

http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php
-- 
Thom

Re: change in LOCK behavior

From

Tomas Vondra

Date:

10 October 2012, 20:48:40

On 10.10.2012 22:42, Andres Freund wrote:
> On Wednesday, October 10, 2012 10:21:51 PM Tomas Vondra wrote:
>> Hi,
>>
>> I've just noticed a change of LOCK command behavior between 9.1 and 9.2,
>> and I'm not sure whether this is expected or not.
>>
>> Let's use a very simple table
>>
>>   CREATE TABLE x (id INT);
>>
>> Say there are two sessions - A and B, where A performs some operations
>> on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g.
>> it might be a pg_bulkload that acquires such locks, and we need to do
>> that explicitly on one or two places).
>>
>> Session B is attempting to read the data, but is blocked and waits. On
>> 9.1 it sees the commited data (which is what we need) but on 9.2 it sees
>> only data commited at the time of the lock attemt.
>>
>> Example:
>>
>> A: BEGIN;
>> A: LOCK x IN ACCESS EXCLUSIVE MODE;
>> A: INSERT INTO x VALUES (100);
>> B: SELECT * FROM x;
>> A: COMMIT;
>>
>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
>>
>> Is this expected? I suspect the snapshot is read at different time or
>> something, but I've checked release notes but I haven't seen anything
>> relevant.
>>
>> Without getting the commited version of data, the locking is somehow
>> pointless for us (unless using a different lock, not the table itself).
> That sounds like youre using different isolation levels in 9.1 and 9.2. Is that 
> possible? I.e. your 9.1 test uses read committed, and 9.2 uses repeatable read 
> or serializable.

Nope, it's 'read commited' on both. I haven't touched this, but I've
verified it to be sure.

============ 9.1 ============

$ psql testdb
psql (9.1.6)
Type "help" for help.

testdb=# show server_version;server_version
----------------9.1.6
(1 row)

testdb=# show transaction_isolation ;transaction_isolation
-----------------------read committed
(1 row)

============ 9.2 ============

$ psql testdb
psql (9.2.0)
Type "help" for help.

testdb=# show server_version;server_version
----------------9.2.0
(1 row)

testdb=# show transaction_isolation
testdb-# ;transaction_isolation
-----------------------read committed
(1 row)

Re: change in LOCK behavior

From

Tomas Vondra

Date:

10 October 2012, 20:51:17

On 10.10.2012 22:43, Thom Brown wrote:
> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote:
>> Example:
>>
>> A: BEGIN;
>> A: LOCK x IN ACCESS EXCLUSIVE MODE;
>> A: INSERT INTO x VALUES (100);
>> B: SELECT * FROM x;
>> A: COMMIT;
>>
>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
>>
>> Is this expected? I suspect the snapshot is read at different time or
>> something, but I've checked release notes but I haven't seen anything
>> relevant.
>>
>> Without getting the commited version of data, the locking is somehow
>> pointless for us (unless using a different lock, not the table itself).
> 
> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da
> 
> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php

Maybe, the description suggests it might be related. I'm still not sure
whether this is a bug or expected behavior, although the commit clearly
states that the change shouldn't be user-visible.

Tomas

Re: change in LOCK behavior

From

Andres Freund

Date:

10 October 2012, 21:05:31

On Wednesday, October 10, 2012 10:43:57 PM Thom Brown wrote:
> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote:
> > Hi,
> > 
> > I've just noticed a change of LOCK command behavior between 9.1 and 9.2,
> > and I'm not sure whether this is expected or not.
> > 
> > Let's use a very simple table
> > 
> >   CREATE TABLE x (id INT);
> > 
> > Say there are two sessions - A and B, where A performs some operations
> > on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g.
> > it might be a pg_bulkload that acquires such locks, and we need to do
> > that explicitly on one or two places).
> > 
> > Session B is attempting to read the data, but is blocked and waits. On
> > 9.1 it sees the commited data (which is what we need) but on 9.2 it sees
> > only data commited at the time of the lock attemt.
> > 
> > Example:
> > 
> > A: BEGIN;
> > A: LOCK x IN ACCESS EXCLUSIVE MODE;
> > A: INSERT INTO x VALUES (100);
> > B: SELECT * FROM x;
> > A: COMMIT;
> > 
> > Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
> > 
> > Is this expected? I suspect the snapshot is read at different time or
> > something, but I've checked release notes but I haven't seen anything
> > relevant.
> > 
> > Without getting the commited version of data, the locking is somehow
> > pointless for us (unless using a different lock, not the table itself).
> 
> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da
> 
> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php
Very likely, yes. In fact you get the same beaviour in 9.1 if you modify the 
example slightly:

B: PREPARE foo AS SELECT * FROM x;
A: BEGIN;
A: LOCK x IN ACCESS EXCLUSIVE MODE;
A: INSERT INTO x VALUES (100);
B: EXECUTE foo;
A: COMMIT;

If you think about it for a second its not that surprising anymore. We start to 
execute a query, acquire a snapshot for that, and then wait for the locks on 
the target relations. We continue executing in the same snapshot for the 
duration of the statement and thus cannot see any of the new rows which 
committed *after* we assembled our snapshot.

The easy workaround is acquiring a AccessShareLock in the B transaction 
separately.

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: change in LOCK behavior

From

Tomas Vondra

Date:

10 October 2012, 21:23:17

On 10.10.2012 23:05, Andres Freund wrote:
> On Wednesday, October 10, 2012 10:43:57 PM Thom Brown wrote:
>> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> Hi,
>>>
>>> I've just noticed a change of LOCK command behavior between 9.1 and 9.2,
>>> and I'm not sure whether this is expected or not.
>>>
>>> Let's use a very simple table
>>>
>>>   CREATE TABLE x (id INT);
>>>
>>> Say there are two sessions - A and B, where A performs some operations
>>> on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g.
>>> it might be a pg_bulkload that acquires such locks, and we need to do
>>> that explicitly on one or two places).
>>>
>>> Session B is attempting to read the data, but is blocked and waits. On
>>> 9.1 it sees the commited data (which is what we need) but on 9.2 it sees
>>> only data commited at the time of the lock attemt.
>>>
>>> Example:
>>>
>>> A: BEGIN;
>>> A: LOCK x IN ACCESS EXCLUSIVE MODE;
>>> A: INSERT INTO x VALUES (100);
>>> B: SELECT * FROM x;
>>> A: COMMIT;
>>>
>>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
>>>
>>> Is this expected? I suspect the snapshot is read at different time or
>>> something, but I've checked release notes but I haven't seen anything
>>> relevant.
>>>
>>> Without getting the commited version of data, the locking is somehow
>>> pointless for us (unless using a different lock, not the table itself).
>>
>> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da
>>
>> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php
> Very likely, yes. In fact you get the same beaviour in 9.1 if you modify the 
> example slightly:
> 
> B: PREPARE foo AS SELECT * FROM x;
> A: BEGIN;
> A: LOCK x IN ACCESS EXCLUSIVE MODE;
> A: INSERT INTO x VALUES (100);
> B: EXECUTE foo;
> A: COMMIT;
> 
> If you think about it for a second its not that surprising anymore. We start to 
> execute a query, acquire a snapshot for that, and then wait for the locks on 
> the target relations. We continue executing in the same snapshot for the 
> duration of the statement and thus cannot see any of the new rows which 
> committed *after* we assembled our snapshot.

Yes, that was my guess too (that the snapshot is acquired before asking
for the lock and not re-acquired after getting the lock).

> The easy workaround is acquiring a AccessShareLock in the B transaction 
> separately.

I know - I've mentioned explicit locking as a possible solution in my
first message, although it would make the whole process more complex.

The question is whether that should be necessary or whether the 9.2
should behave the same as 9.1.



Tomas

Re: change in LOCK behavior

From

Andres Freund

Date:

10 October 2012, 21:31:24

On Wednesday, October 10, 2012 11:23:10 PM Tomas Vondra wrote:
> On 10.10.2012 23:05, Andres Freund wrote:
> > On Wednesday, October 10, 2012 10:43:57 PM Thom Brown wrote:
> >> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote:
> >>> Hi,
> >>> 
> >>> I've just noticed a change of LOCK command behavior between 9.1 and
> >>> 9.2, and I'm not sure whether this is expected or not.
> >>> 
> >>> Let's use a very simple table
> >>> 
> >>>   CREATE TABLE x (id INT);
> >>> 
> >>> Say there are two sessions - A and B, where A performs some operations
> >>> on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g.
> >>> it might be a pg_bulkload that acquires such locks, and we need to do
> >>> that explicitly on one or two places).
> >>> 
> >>> Session B is attempting to read the data, but is blocked and waits. On
> >>> 9.1 it sees the commited data (which is what we need) but on 9.2 it
> >>> sees only data commited at the time of the lock attemt.
> >>> 
> >>> Example:
> >>> 
> >>> A: BEGIN;
> >>> A: LOCK x IN ACCESS EXCLUSIVE MODE;
> >>> A: INSERT INTO x VALUES (100);
> >>> B: SELECT * FROM x;
> >>> A: COMMIT;
> >>> 
> >>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
> >>> 
> >>> Is this expected? I suspect the snapshot is read at different time or
> >>> something, but I've checked release notes but I haven't seen anything
> >>> relevant.
> >>> 
> >>> Without getting the commited version of data, the locking is somehow
> >>> pointless for us (unless using a different lock, not the table itself).
> >> 
> >> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da
> >> 
> >> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php
> > 
> > Very likely, yes. In fact you get the same beaviour in 9.1 if you modify
> > the example slightly:
> > 
> > B: PREPARE foo AS SELECT * FROM x;
> > A: BEGIN;
> > A: LOCK x IN ACCESS EXCLUSIVE MODE;
> > A: INSERT INTO x VALUES (100);
> > B: EXECUTE foo;
> > A: COMMIT;
> > 
> > If you think about it for a second its not that surprising anymore. We
> > start to execute a query, acquire a snapshot for that, and then wait for
> > the locks on the target relations. We continue executing in the same
> > snapshot for the duration of the statement and thus cannot see any of
> > the new rows which committed *after* we assembled our snapshot.
> 
> Yes, that was my guess too (that the snapshot is acquired before asking
> for the lock and not re-acquired after getting the lock).
> 
> > The easy workaround is acquiring a AccessShareLock in the B transaction
> > separately.
> 
> I know - I've mentioned explicit locking as a possible solution in my
> first message, although it would make the whole process more complex.
I read your original statement as if you would want to use a separate lock 
(advisory?) which you don't need.

> The question is whether that should be necessary or whether the 9.2
> should behave the same as 9.1.
Given that 9.1 behaves the same as 9.2 with prepared statements I don't really 
see a convincing argument for changing this from the status quo.

You can hit the same/similar behaviour in 9.1 even if youre not using PREPARE 
although the window isn't too big and you need DML + only an EXCLUSIVE (not 
access exlusive) lock for it.

Greetings,

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: change in LOCK behavior

From

Tomas Vondra

Date:

10 October 2012, 21:45:45

On 10.10.2012 23:31, Andres Freund wrote:
> On Wednesday, October 10, 2012 11:23:10 PM Tomas Vondra wrote:
>> On 10.10.2012 23:05, Andres Freund wrote:
>>> On Wednesday, October 10, 2012 10:43:57 PM Thom Brown wrote:
>>>> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote:
>>>>> Hi,
>>>>>
>>>>> I've just noticed a change of LOCK command behavior between 9.1 and
>>>>> 9.2, and I'm not sure whether this is expected or not.
>>>>>
>>>>> Let's use a very simple table
>>>>>
>>>>>   CREATE TABLE x (id INT);
>>>>>
>>>>> Say there are two sessions - A and B, where A performs some operations
>>>>> on "x" and needs to protect them with an "ACCESS EXCLUSIVE" lock (e.g.
>>>>> it might be a pg_bulkload that acquires such locks, and we need to do
>>>>> that explicitly on one or two places).
>>>>>
>>>>> Session B is attempting to read the data, but is blocked and waits. On
>>>>> 9.1 it sees the commited data (which is what we need) but on 9.2 it
>>>>> sees only data commited at the time of the lock attemt.
>>>>>
>>>>> Example:
>>>>>
>>>>> A: BEGIN;
>>>>> A: LOCK x IN ACCESS EXCLUSIVE MODE;
>>>>> A: INSERT INTO x VALUES (100);
>>>>> B: SELECT * FROM x;
>>>>> A: COMMIT;
>>>>>
>>>>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
>>>>>
>>>>> Is this expected? I suspect the snapshot is read at different time or
>>>>> something, but I've checked release notes but I haven't seen anything
>>>>> relevant.
>>>>>
>>>>> Without getting the commited version of data, the locking is somehow
>>>>> pointless for us (unless using a different lock, not the table itself).
>>>>
>>>> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da
>>>>
>>>> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php
>>>
>>> Very likely, yes. In fact you get the same beaviour in 9.1 if you modify
>>> the example slightly:
>>>
>>> B: PREPARE foo AS SELECT * FROM x;
>>> A: BEGIN;
>>> A: LOCK x IN ACCESS EXCLUSIVE MODE;
>>> A: INSERT INTO x VALUES (100);
>>> B: EXECUTE foo;
>>> A: COMMIT;
>>>
>>> If you think about it for a second its not that surprising anymore. We
>>> start to execute a query, acquire a snapshot for that, and then wait for
>>> the locks on the target relations. We continue executing in the same
>>> snapshot for the duration of the statement and thus cannot see any of
>>> the new rows which committed *after* we assembled our snapshot.
>>
>> Yes, that was my guess too (that the snapshot is acquired before asking
>> for the lock and not re-acquired after getting the lock).
>>
>>> The easy workaround is acquiring a AccessShareLock in the B transaction
>>> separately.
>>
>> I know - I've mentioned explicit locking as a possible solution in my
>> first message, although it would make the whole process more complex.
> I read your original statement as if you would want to use a separate lock 
> (advisory?) which you don't need.

Oh yeah, right. Any lock would work - advisory or not.

>> The question is whether that should be necessary or whether the 9.2
>> should behave the same as 9.1.
>
> Given that 9.1 behaves the same as 9.2 with prepared statements I don't really 
> see a convincing argument for changing this from the status quo.

Well, equally it's not an argument for the 9.2 behavior, I guess. I'm
not convinced this is a bug (partly because I haven't found any explicit
statement regarding this in the docs), that's why I started this thread
instead of spamming pgsql-bugs.

For us (our app) this means we'll need to make it a bit more complex,
add some more explicit locking that we did not need in 9.1. Acquiring an
Access Share lock explicitly feels a bit strange, because that's the
lock acquired by SELECT statement anyway.

The only difference seems to be that the snapshot is not reacquired
after obtaining the lock. Which may or may not be the right thing,
depending on the definition of when the query was executed (when asking
for the lock or after obtaining it?)

Anyway, this seems to me like a behavior change that might bite many
others, unknowingly depending on the 9.1-like behavior and I believe
it's worth mentioning somewhere - not sure where.

> You can hit the same/similar behaviour in 9.1 even if youre not using PREPARE 
> although the window isn't too big and you need DML + only an EXCLUSIVE (not 
> access exlusive) lock for it.

Probably yes, but we're not doing that so I haven't noticed that.

Tomas

Re: change in LOCK behavior

From

Andres Freund

Date:

10 October 2012, 21:57:56

On Wednesday, October 10, 2012 11:45:41 PM Tomas Vondra wrote:
> On 10.10.2012 23:31, Andres Freund wrote:
> > On Wednesday, October 10, 2012 11:23:10 PM Tomas Vondra wrote:
> >> On 10.10.2012 23:05, Andres Freund wrote:
> >>> On Wednesday, October 10, 2012 10:43:57 PM Thom Brown wrote:
> >>>> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote:
> >>>>> Hi,
> >>>>> 
> >>>>> I've just noticed a change of LOCK command behavior between 9.1 and
> >>>>> 9.2, and I'm not sure whether this is expected or not.
> >>>>> 
> >>>>> Let's use a very simple table
> >>>>> 
> >>>>>   CREATE TABLE x (id INT);
> >>>>> 
> >>>>> Say there are two sessions - A and B, where A performs some
> >>>>> operations on "x" and needs to protect them with an "ACCESS
> >>>>> EXCLUSIVE" lock (e.g. it might be a pg_bulkload that acquires such
> >>>>> locks, and we need to do that explicitly on one or two places).
> >>>>> 
> >>>>> Session B is attempting to read the data, but is blocked and waits.
> >>>>> On 9.1 it sees the commited data (which is what we need) but on 9.2
> >>>>> it sees only data commited at the time of the lock attemt.
> >>>>> 
> >>>>> Example:
> >>>>> 
> >>>>> A: BEGIN;
> >>>>> A: LOCK x IN ACCESS EXCLUSIVE MODE;
> >>>>> A: INSERT INTO x VALUES (100);
> >>>>> B: SELECT * FROM x;
> >>>>> A: COMMIT;
> >>>>> 
> >>>>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
> >>>>> 
> >>>>> Is this expected? I suspect the snapshot is read at different time or
> >>>>> something, but I've checked release notes but I haven't seen anything
> >>>>> relevant.
> >>>>> 
> >>>>> Without getting the commited version of data, the locking is somehow
> >>>>> pointless for us (unless using a different lock, not the table
> >>>>> itself).
> >>>> 
> >>>> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da
> >>>> 
> >>>> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php
> >>> 
> >>> Very likely, yes. In fact you get the same beaviour in 9.1 if you
> >>> modify the example slightly:
> >>> 
> >>> B: PREPARE foo AS SELECT * FROM x;
> >>> A: BEGIN;
> >>> A: LOCK x IN ACCESS EXCLUSIVE MODE;
> >>> A: INSERT INTO x VALUES (100);
> >>> B: EXECUTE foo;
> >>> A: COMMIT;
> >>> 
> >>> If you think about it for a second its not that surprising anymore. We
> >>> start to execute a query, acquire a snapshot for that, and then wait
> >>> for the locks on the target relations. We continue executing in the
> >>> same snapshot for the duration of the statement and thus cannot see
> >>> any of the new rows which committed *after* we assembled our snapshot.
> >> 
> >> Yes, that was my guess too (that the snapshot is acquired before asking
> >> for the lock and not re-acquired after getting the lock).
> >> 
> >>> The easy workaround is acquiring a AccessShareLock in the B transaction
> >>> separately.
> >> 
> >> I know - I've mentioned explicit locking as a possible solution in my
> >> first message, although it would make the whole process more complex.
> > 
> > I read your original statement as if you would want to use a separate
> > lock (advisory?) which you don't need.
> 
> Oh yeah, right. Any lock would work - advisory or not.
Well, it needs to be a lock youre conflicting on, not any lock ;)

> >> The question is whether that should be necessary or whether the 9.2
> >> should behave the same as 9.1.
> > 
> > Given that 9.1 behaves the same as 9.2 with prepared statements I don't
> > really see a convincing argument for changing this from the status quo.
> 
> Well, equally it's not an argument for the 9.2 behavior, I guess. I'm
> not convinced this is a bug (partly because I haven't found any explicit
> statement regarding this in the docs), that's why I started this thread
> instead of spamming pgsql-bugs.
> 
> For us (our app) this means we'll need to make it a bit more complex,
> add some more explicit locking that we did not need in 9.1. Acquiring an
> Access Share lock explicitly feels a bit strange, because that's the
> lock acquired by SELECT statement anyway.
Yea, but its acquired *after* the snapshot is taken. And again, thats what 
happened in 9.1 as well. Just that *another* snapshot was just for planning the 
query which by also needs to lock the table in share mode. So after the lock 
was taken for planning a new snapshot was acquired for execution... Thats not 
the case anymore in simpler cases.

> The only difference seems to be that the snapshot is not reacquired
> after obtaining the lock. Which may or may not be the right thing,
> depending on the definition of when the query was executed (when asking
> for the lock or after obtaining it?)
You can't generally reacquire snapshots after waiting for a lock. For one it 
would be noticeably expensive and for another it would actually result in very 
strange behaviour in queries with multiple tables.

> Anyway, this seems to me like a behavior change that might bite many
> others, unknowingly depending on the 9.1-like behavior and I believe
> it's worth mentioning somewhere - not sure where.
"Locking is not as simple as you (and most of us) thought!" ;)

> > You can hit the same/similar behaviour in 9.1 even if youre not using
> > PREPARE although the window isn't too big and you need DML + only an
> > EXCLUSIVE (not access exlusive) lock for it.
>
> Probably yes, but we're not doing that so I haven't noticed that.
Btw, unrelated to this problem, but why are you access exlusive locking that 
table? Shouldn't an exlusive lock be enough?

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: change in LOCK behavior

From

Tomas Vondra

Date:

10 October 2012, 22:55:15

On 10.10.2012 23:57, Andres Freund wrote:
> On Wednesday, October 10, 2012 11:45:41 PM Tomas Vondra wrote:
>> Oh yeah, right. Any lock would work - advisory or not.
> Well, it needs to be a lock youre conflicting on, not any lock ;)

Oh, yeah, right. I was thinking about acquiring the same advisory lock
in both sessions, but somehow I forgot to mention that.

>> For us (our app) this means we'll need to make it a bit more complex,
>> add some more explicit locking that we did not need in 9.1. Acquiring an
>> Access Share lock explicitly feels a bit strange, because that's the
>> lock acquired by SELECT statement anyway.
> Yea, but its acquired *after* the snapshot is taken. And again, thats what 
> happened in 9.1 as well. Just that *another* snapshot was just for planning the 
> query which by also needs to lock the table in share mode. So after the lock 
> was taken for planning a new snapshot was acquired for execution... Thats not 
> the case anymore in simpler cases.

Yes, exactly what I was suspecting.

>> The only difference seems to be that the snapshot is not reacquired
>> after obtaining the lock. Which may or may not be the right thing,
>> depending on the definition of when the query was executed (when asking
>> for the lock or after obtaining it?)
> You can't generally reacquire snapshots after waiting for a lock. For one it 
> would be noticeably expensive and for another it would actually result in very 
> strange behaviour in queries with multiple tables.

Not sure what strange behaviour would that cause (given that 9.1 did
that if I understand that correctly). More expensive - no doubt about
it, and the commit message mentions that it's an optimization.

>> Anyway, this seems to me like a behavior change that might bite many
>> others, unknowingly depending on the 9.1-like behavior and I believe
>> it's worth mentioning somewhere - not sure where.
> "Locking is not as simple as you (and most of us) thought!" ;)

I've never thought it's simple, there's a lot of things to deal with.
But I was somehow surprised that something that worked fine for quite
long time, suddenly broke on 9.2. Yes, it might be a dependency on
something that was not really guaranteed.

>>> You can hit the same/similar behaviour in 9.1 even if youre not using
>>> PREPARE although the window isn't too big and you need DML + only an
>>> EXCLUSIVE (not access exlusive) lock for it.
>>
>> Probably yes, but we're not doing that so I haven't noticed that.
> Btw, unrelated to this problem, but why are you access exlusive locking that 
> table? Shouldn't an exlusive lock be enough?

Long story short, we don't, pg_bulkload does. And it's also the reason
why we're so unhappy about the change, because the other session is
waiting for the load to complete and then it is served with an old snapshot.

Tomas

Re: change in LOCK behavior

From

Tom Lane

Date:

10 October 2012, 23:10:32

Tomas Vondra <tv@fuzzy.cz> writes:
> On 10.10.2012 22:43, Thom Brown wrote:
>> On 10 October 2012 21:21, Tomas Vondra <tv@fuzzy.cz> wrote:
>>> A: BEGIN;
>>> A: LOCK x IN ACCESS EXCLUSIVE MODE;
>>> A: INSERT INTO x VALUES (100);
>>> B: SELECT * FROM x;
>>> A: COMMIT;
>>> 
>>> Now on 9.1, B receives the value "100" while on 9.2 it gets no rows.
>>> 
>>> Is this expected? I suspect the snapshot is read at different time or
>>> something, but I've checked release notes but I haven't seen anything
>>> relevant.
>>> 
>>> Without getting the commited version of data, the locking is somehow
>>> pointless for us (unless using a different lock, not the table itself).

>> I suspect it's this commit: d573e239f03506920938bf0be56c868d9c3416da
>> http://archives.postgresql.org/pgsql-committers/2011-12/msg00167.php

> Maybe, the description suggests it might be related. I'm still not sure
> whether this is a bug or expected behavior, although the commit clearly
> states that the change shouldn't be user-visible.

Yeah, I think that last is the key point: this patch was sold on the
grounds that it wouldn't cause any interesting user-visible behavioral
change, but your example blows that claim into tiny little pieces.

I'm inclined to think we need to revert this.  The performance gain is
not worth the prospect of breaking a lot of applications that used to
work reliably.  Robert?
        regards, tom lane

Re: change in LOCK behavior

From

Robert Haas

Date:

11 October 2012, 00:18:03

On Wed, Oct 10, 2012 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yeah, I think that last is the key point: this patch was sold on the
> grounds that it wouldn't cause any interesting user-visible behavioral
> change, but your example blows that claim into tiny little pieces.
>
> I'm inclined to think we need to revert this.  The performance gain is
> not worth the prospect of breaking a lot of applications that used to
> work reliably.  Robert?

Yeah, I have to admit that I didn't consider the possibility that a
lock wait might intervene between the first and second snapshots.  But
something about this isn't making sense to me.  The whole idea of that
patch is that we reuse the parse/plan snapshot at execution time.  If
it's wrong to use a snapshot that might be stale at execution time,
then it's equally wrong to use it for parse/plan.  Surely, parse/plan
also requires a lock before doing anything interesting with the table,
so the problem here must be that we're taking the parse/plan snapshot,
then getting the lock (after waiting), then doing something with the
parse/plan snapshot that fails to notice that the snapshot is stale.
However, in this example, the parse/plan stuff we do with the stale
snapshot fails to be noticeable, but if we continue on to execution
then it is noticeable.

But that sounds more like a happy accident than anything else.
Andres' example upthread shows that a slightly different test case
breaks on 9.1 and 9.2, and I suspect it's possible to construct other
examples that behave in surprising ways on 9.1 but *not* on 9.2, by
looking for a scenario where the fact that parse/plan uses a different
snapshot from execution manifests itself as a user-visible
inconsistency.  Reverting the patch has the advantage of being simple,
not requiring a lot of thought, and restoring the historical behavior,
but I'm not that excited about it otherwise.  It seems to me that the
root of the issue here is that people is not that people expect two
snapshots -- indeed, a number of people strongly supported getting rid
of that behavior at the time -- but rather that they expect the
snapshot to be taken after locks are acquired.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: change in LOCK behavior

From

Tom Lane

Date:

11 October 2012, 00:43:42

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Oct 10, 2012 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, I think that last is the key point: this patch was sold on the
>> grounds that it wouldn't cause any interesting user-visible behavioral
>> change, but your example blows that claim into tiny little pieces.
>> 
>> I'm inclined to think we need to revert this.  The performance gain is
>> not worth the prospect of breaking a lot of applications that used to
>> work reliably.  Robert?

> Yeah, I have to admit that I didn't consider the possibility that a
> lock wait might intervene between the first and second snapshots.  But
> something about this isn't making sense to me.

It's not hard to see what's happening.  The old code was:
* take parse/plan snapshot* parse & plan query (includes taking AccessShare lock on tables)* take execution snapshot*
runquery

Taking the AccessShare lock blocks until the transaction with exclusive
lock commits; therefore, the execution snapshot will see that
transaction's effects.

In the new code, we re-use the parse/plan snapshot for execution, so it
predates the other transaction's commit, so we don't see its effects.

> The whole idea of that
> patch is that we reuse the parse/plan snapshot at execution time.  If
> it's wrong to use a snapshot that might be stale at execution time,
> then it's equally wrong to use it for parse/plan.

Nope, because the parse/plan snapshot is only relevant to planner
estimation purposes.  It doesn't have to be exactly right.  (What does
have to be exactly right is our idea of the table structure, but that's
okay because any catalog consultation we do is with SnapshotNow, and
hence will see any DDL the other transaction might have done.)

> But that sounds more like a happy accident than anything else.

[ shrug... ]  Maybe it's a happy accident and maybe somebody designed it
with malice aforethought many years ago.  But the point is that this
type of thing worked, with 100% reliability, before 9.2.  Now it does
not.  I don't think we can accept that.

> It seems to me that the
> root of the issue here is that people is not that people expect two
> snapshots -- indeed, a number of people strongly supported getting rid
> of that behavior at the time -- but rather that they expect the
> snapshot to be taken after locks are acquired.

Sure.  Maybe we can rejigger things in a way that does that, although
I think the stumbling block is going to be parse-time calls to
user-defined I/O functions for constants --- which might need a
snapshot.  It might be possible to redesign things so that all tables
are locked before we do anything that requires a non-SnapshotNow
snapshot, and then take a single "planning/execution" snapshot.  But
that is not this patch, and would be a lot more invasive than this
patch, and would certainly not be back-patchable to 9.2.

I think we have to revert and go back to the drawing board on this.
        regards, tom lane

Re: change in LOCK behavior

From

Bruce Momjian

Date:

11 October 2012, 01:11:26

On Wed, Oct 10, 2012 at 08:43:34PM -0400, Tom Lane wrote:
> > It seems to me that the
> > root of the issue here is that people is not that people expect two
> > snapshots -- indeed, a number of people strongly supported getting rid
> > of that behavior at the time -- but rather that they expect the
> > snapshot to be taken after locks are acquired.
> 
> Sure.  Maybe we can rejigger things in a way that does that, although
> I think the stumbling block is going to be parse-time calls to
> user-defined I/O functions for constants --- which might need a
> snapshot.  It might be possible to redesign things so that all tables
> are locked before we do anything that requires a non-SnapshotNow
> snapshot, and then take a single "planning/execution" snapshot.  But
> that is not this patch, and would be a lot more invasive than this
> patch, and would certainly not be back-patchable to 9.2.
> 
> I think we have to revert and go back to the drawing board on this.

Is reverting going to adversely affect users who are already using the
9.2 behavior?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: change in LOCK behavior

From

Tom Lane

Date:

11 October 2012, 01:29:23

Bruce Momjian <bruce@momjian.us> writes:
> On Wed, Oct 10, 2012 at 08:43:34PM -0400, Tom Lane wrote:
>> I think we have to revert and go back to the drawing board on this.

> Is reverting going to adversely affect users who are already using the
> 9.2 behavior?

In what way would somebody be relying on the 9.2 behavior?
        regards, tom lane

Re: change in LOCK behavior

From

Bruce Momjian

Date:

11 October 2012, 01:34:35

On Wed, Oct 10, 2012 at 09:29:16PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Wed, Oct 10, 2012 at 08:43:34PM -0400, Tom Lane wrote:
> >> I think we have to revert and go back to the drawing board on this.
> 
> > Is reverting going to adversely affect users who are already using the
> > 9.2 behavior?
> 
> In what way would somebody be relying on the 9.2 behavior?

I don't know.  I am just asking if an application could be relying on
the 9.2 behavior.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: change in LOCK behavior

From

Tom Lane

Date:

11 October 2012, 02:01:39

Bruce Momjian <bruce@momjian.us> writes:
> On Wed, Oct 10, 2012 at 09:29:16PM -0400, Tom Lane wrote:
>> In what way would somebody be relying on the 9.2 behavior?

> I don't know.  I am just asking if an application could be relying on
> the 9.2 behavior.

I don't think so.  Robert suggested in the original discussion that
there could be cases where users would notice if the plan snapshot was
different from the execution snapshot, but on reflection I consider that
argument bogus.  We have these categories of snapshot-dependent things
that happen before execution starts:

* System examination of table DDL.  This is all done with SnapshotNow
and after acquiring a lock on the table, so it's secure.

* Evaluation of the input functions for user-defined types and domains
(the latter of which can invoke nearly arbitrary code via CHECK
constraints).  In principle you could imagine that one of these
datatypes or domain check constraints involves looking at the tables
that the surrounding query is going to touch, but I don't think anybody
would consider that good programming style, much less expect that it
would necessarily see exactly the same table state as query execution
does.

* Evaluation of IMMUTABLE functions at plan time.  If such a function
actually cares exactly which snapshot it runs with, then it's not very
immutable, and I feel no compunction about breaking it.

* Evaluation of STABLE functions at plan time for estimation purposes.
Such a function might well get different answers depending on which
snapshot it uses --- but the result is only used for estimation
purposes.  The worst possible consequence is an inferior plan; there is
no correctness issue.

So it seems to me to be pretty difficult to credit that any of these
things would care at all whether they saw the exact same snapshot that
query execution does.  It's even less plausible that somebody would have
created such a dependency in the short time 9.2 has been out, when such
code could not have worked in any prior release.
        regards, tom lane

Re: change in LOCK behavior

From

Bruce Momjian

Date:

11 October 2012, 02:12:48

On Wed, Oct 10, 2012 at 10:01:30PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Wed, Oct 10, 2012 at 09:29:16PM -0400, Tom Lane wrote:
> >> In what way would somebody be relying on the 9.2 behavior?
> 
> > I don't know.  I am just asking if an application could be relying on
> > the 9.2 behavior.
> 
> I don't think so.  Robert suggested in the original discussion that
> there could be cases where users would notice if the plan snapshot was
> different from the execution snapshot, but on reflection I consider that
> argument bogus.  We have these categories of snapshot-dependent things
> that happen before execution starts:
> 
> * System examination of table DDL.  This is all done with SnapshotNow
> and after acquiring a lock on the table, so it's secure.
> 
> * Evaluation of the input functions for user-defined types and domains
> (the latter of which can invoke nearly arbitrary code via CHECK
> constraints).  In principle you could imagine that one of these
> datatypes or domain check constraints involves looking at the tables
> that the surrounding query is going to touch, but I don't think anybody
> would consider that good programming style, much less expect that it
> would necessarily see exactly the same table state as query execution
> does.
> 
> * Evaluation of IMMUTABLE functions at plan time.  If such a function
> actually cares exactly which snapshot it runs with, then it's not very
> immutable, and I feel no compunction about breaking it.
> 
> * Evaluation of STABLE functions at plan time for estimation purposes.
> Such a function might well get different answers depending on which
> snapshot it uses --- but the result is only used for estimation
> purposes.  The worst possible consequence is an inferior plan; there is
> no correctness issue.
> 
> So it seems to me to be pretty difficult to credit that any of these
> things would care at all whether they saw the exact same snapshot that
> query execution does.  It's even less plausible that somebody would have
> created such a dependency in the short time 9.2 has been out, when such
> code could not have worked in any prior release.

Thanks.  You have covered all I was asking.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: change in LOCK behavior

From

Simon Riggs

Date:

11 October 2012, 08:57:39

On 11 October 2012 01:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think we have to revert and go back to the drawing board on this.

Given that change was also sold on the basis of higher performance, I
suggest we retest performance to check there is a gain. If there is
still a gain, I suggest we add this as a SIGHUP option, default to
off, rather than completely remove it.


I might also observe since the problem only happens with lock waits,
perhaps we can set a flag can_reuse_snapshot that gets cleared if we
need to perform a lock wait before executing the main statement.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: change in LOCK behavior

From

Tom Lane

Date:

11 October 2012, 16:53:51

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 11 October 2012 01:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think we have to revert and go back to the drawing board on this.

> Given that change was also sold on the basis of higher performance, I
> suggest we retest performance to check there is a gain. If there is
> still a gain, I suggest we add this as a SIGHUP option, default to
> off, rather than completely remove it.

I'm not in favor of adding a GUC for this.  The right fix is to redesign
the locking/snapshotting process, not expose its warts in bizarre little
knobs that make users deal with the tradeoffs.

Maybe what we really need is to find a way to make taking a snapshot a
lot cheaper, such that the whole need for this patch goes away.  We're
not going to get far with the idea of making SnapshotNow MVCC-safe
unless it becomes a lot cheaper to get an MVCC snapshot.  I recall some
discussion of trying to reduce a snapshot to a WAL offset --- did that
idea crash and burn, or is it still viable?

Anyway, I believe that for now we ought to revert and rethink, not look
for band-aid ways of preserving this patch.
        regards, tom lane

Re: change in LOCK behavior

From

Simon Riggs

Date:

11 October 2012, 17:10:40

On 11 October 2012 17:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 11 October 2012 01:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think we have to revert and go back to the drawing board on this.
>
>> Given that change was also sold on the basis of higher performance, I
>> suggest we retest performance to check there is a gain. If there is
>> still a gain, I suggest we add this as a SIGHUP option, default to
>> off, rather than completely remove it.
>
> I'm not in favor of adding a GUC for this.  The right fix is to redesign
> the locking/snapshotting process, not expose its warts in bizarre little
> knobs that make users deal with the tradeoffs.

While I agree with that thought, I'd like to try a little harder than
simply revert.

> Maybe what we really need is to find a way to make taking a snapshot a
> lot cheaper, such that the whole need for this patch goes away.  We're
> not going to get far with the idea of making SnapshotNow MVCC-safe
> unless it becomes a lot cheaper to get an MVCC snapshot.  I recall some
> discussion of trying to reduce a snapshot to a WAL offset --- did that
> idea crash and burn, or is it still viable?

I think that is still at the "fond wish" stage and definitely not
backpatchable in this universe.

> Anyway, I believe that for now we ought to revert and rethink, not look
> for band-aid ways of preserving this patch.

I suggested a way to automatically trigger a second snapshot. I think
that would be acceptable to backpatch.

But since I'm leaving this to you, I'll leave that decision to you as well.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: change in LOCK behavior

From

Tom Lane

Date:

11 October 2012, 17:22:55

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 11 October 2012 17:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe what we really need is to find a way to make taking a snapshot a
>> lot cheaper, such that the whole need for this patch goes away.  We're
>> not going to get far with the idea of making SnapshotNow MVCC-safe
>> unless it becomes a lot cheaper to get an MVCC snapshot.  I recall some
>> discussion of trying to reduce a snapshot to a WAL offset --- did that
>> idea crash and burn, or is it still viable?

> I think that is still at the "fond wish" stage and definitely not
> backpatchable in this universe.

Of course not.  This performance improvement is simply lost for 9.2.
We can try to do better in future releases, but it's a failed experiment
for now.  Life is hard :-(

> I suggested a way to automatically trigger a second snapshot. I think
> that would be acceptable to backpatch.

If it worked, I might be amenable to that, but it doesn't.  You can't
trigger taking a new snapshot off whether we waited for a lock; that
still has race conditions, just ones that are not so trivial to
demonstrate manually.  (The other transaction might have committed
microseconds before you reach the point of waiting for the lock.)
It would have to be a rule like "take a new snapshot if we acquired
any new lock since the previous snapshot".  While that would work,
we'd end up with no performance gain worth mentioning, since there
would almost always be some lock acquisitions during parsing.
        regards, tom lane

Re: change in LOCK behavior

From

Simon Riggs

Date:

11 October 2012, 18:24:25

On 11 October 2012 18:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:

>> I suggested a way to automatically trigger a second snapshot. I think
>> that would be acceptable to backpatch.
>
> If it worked, I might be amenable to that, but it doesn't.  You can't
> trigger taking a new snapshot off whether we waited for a lock; that
> still has race conditions, just ones that are not so trivial to
> demonstrate manually.  (The other transaction might have committed
> microseconds before you reach the point of waiting for the lock.)
> It would have to be a rule like "take a new snapshot if we acquired
> any new lock since the previous snapshot".  While that would work,
> we'd end up with no performance gain worth mentioning, since there
> would almost always be some lock acquisitions during parsing.

So where's the race?

AFAICS it either waits or it doesn't - the code isn't vague on that
point. If we wait we set the flag.

The point is that lock waits are pretty rare since most locks are
compatible, so triggering a second snap if we waited is not any kind
of problem, even if we waited for a very short time.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: change in LOCK behavior

From

Robert Haas

Date:

11 October 2012, 18:36:03

On Thu, Oct 11, 2012 at 2:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> So where's the race?
>
> AFAICS it either waits or it doesn't - the code isn't vague on that
> point. If we wait we set the flag.
>
> The point is that lock waits are pretty rare since most locks are
> compatible, so triggering a second snap if we waited is not any kind
> of problem, even if we waited for a very short time.

That actually wouldn't fix the problem, because we might have this scenario:

1. We take a snapshot.
2. Some other process commits, clearing its XID from its PGPROC and
releasing locks.
3. We take a lock.

:-(

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: change in LOCK behavior

From

Tom Lane

Date:

11 October 2012, 18:42:10

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 11 October 2012 18:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If it worked, I might be amenable to that, but it doesn't.  You can't
>> trigger taking a new snapshot off whether we waited for a lock; that
>> still has race conditions, just ones that are not so trivial to
>> demonstrate manually.  (The other transaction might have committed
>> microseconds before you reach the point of waiting for the lock.)

> So where's the race?

Same example as before, except that the exclusive-lock-holding
transaction commits (and releases its lock) between the time that the
other transaction takes its parse/plan snapshot and the time that it
takes AccessShare lock on the table.
        regards, tom lane

Re: change in LOCK behavior

From

Simon Riggs

Date:

11 October 2012, 18:48:54

On 11 October 2012 19:36, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Oct 11, 2012 at 2:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> So where's the race?
>>
>> AFAICS it either waits or it doesn't - the code isn't vague on that
>> point. If we wait we set the flag.
>>
>> The point is that lock waits are pretty rare since most locks are
>> compatible, so triggering a second snap if we waited is not any kind
>> of problem, even if we waited for a very short time.
>
> That actually wouldn't fix the problem, because we might have this scenario:
>
> 1. We take a snapshot.
> 2. Some other process commits, clearing its XID from its PGPROC and
> releasing locks.
> 3. We take a lock.

Hmm, so now the patch author thinks his patch is not just broken with
respect to lock waits, but in all cases? Surely the above race
condition is obvious, now and before. Why is it suddenly unacceptable?
(If you believe that, why on earth did you commit?)

Whenever you take a snapshot things can change before you start using
it. And as a result all previous releases of Postgres suffer this
problem. Ergo, we should defer taking the snapshot until the very last
point when we need to use it. Why is that *not* being suggested here?

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: change in LOCK behavior

From

Simon Riggs

Date:

11 October 2012, 18:54:55

On 11 October 2012 19:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 11 October 2012 18:22, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> If it worked, I might be amenable to that, but it doesn't.  You can't
>>> trigger taking a new snapshot off whether we waited for a lock; that
>>> still has race conditions, just ones that are not so trivial to
>>> demonstrate manually.  (The other transaction might have committed
>>> microseconds before you reach the point of waiting for the lock.)
>
>> So where's the race?
>
> Same example as before, except that the exclusive-lock-holding
> transaction commits (and releases its lock) between the time that the
> other transaction takes its parse/plan snapshot and the time that it
> takes AccessShare lock on the table.

A cache invalidation could also set the need-second-snapshot flag.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: change in LOCK behavior

From

Robert Haas

Date:

11 October 2012, 19:25:34

On Thu, Oct 11, 2012 at 2:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Hmm, so now the patch author thinks his patch is not just broken with
> respect to lock waits, but in all cases? Surely the above race
> condition is obvious, now and before. Why is it suddenly unacceptable?
> (If you believe that, why on earth did you commit?)
>
> Whenever you take a snapshot things can change before you start using
> it. And as a result all previous releases of Postgres suffer this
> problem.

I agree with this.  I think that the reversion that Tom is pushing for
is fixing a very special case of a more general problem.  Now, it
might be a sufficiently important special-case that we ought to go and
do the revert, but if your argument is that this is an unsatisfying
band-aid, I couldn't agree more.  It appears to me that this
discussion is exposing the fact that we really haven't given much
thought to when snapshots ought to be taken or to arranging the order
of operations so that they are taken as late as is safely possible.

> Ergo, we should defer taking the snapshot until the very last
> point when we need to use it. Why is that *not* being suggested here?

Well, that's actually not safe either, at least if taken literally.
For example, you can't do part of a sequential scan without a snapshot
just because all the tuples are frozen, and then take a snapshot when
you hit a unfrozen xmin/xmax.  Both you and I previously came up with
that idea independently, and it would save a lot of cycles, but it's
unsafe.  Some other backend can update a tuple after you look at it,
and then commit.  When you get to the new tuple, you'll already have
returned the old tuple, and your new snapshot - since it's taken after
the commit - will also see the new tuple.

So we have to take the snapshot before you begin execution, but it
seems that to avoid surprising behavior we also have to take it after
acquiring locks.  And it looks like locking is intertwined with a
bunch of other parse analysis tasks that might require a snapshot to
have been taken first.  Whee.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: change in LOCK behavior

From

Simon Riggs

Date:

11 October 2012, 19:34:03

On 11 October 2012 20:25, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Oct 11, 2012 at 2:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Hmm, so now the patch author thinks his patch is not just broken with
>> respect to lock waits, but in all cases? Surely the above race
>> condition is obvious, now and before. Why is it suddenly unacceptable?
>> (If you believe that, why on earth did you commit?)
>>
>> Whenever you take a snapshot things can change before you start using
>> it. And as a result all previous releases of Postgres suffer this
>> problem.
>
> I agree with this.  I think that the reversion that Tom is pushing for
> is fixing a very special case of a more general problem.  Now, it
> might be a sufficiently important special-case that we ought to go and
> do the revert, but if your argument is that this is an unsatisfying
> band-aid, I couldn't agree more.  It appears to me that this
> discussion is exposing the fact that we really haven't given much
> thought to when snapshots ought to be taken or to arranging the order
> of operations so that they are taken as late as is safely possible.
>
>> Ergo, we should defer taking the snapshot until the very last
>> point when we need to use it. Why is that *not* being suggested here?
>
> Well, that's actually not safe either, at least if taken literally.

Right - I didn't mean that far - since it was me suggested deferring
snapshots previously I'm aware that doesn't work.

But immediately before we read the first heap row... since we might
need to wait on index access, heap I/O etc.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: change in LOCK behavior

From

Tom Lane

Date:

11 October 2012, 19:43:13

Robert Haas <robertmhaas@gmail.com> writes:
> So we have to take the snapshot before you begin execution, but it
> seems that to avoid surprising behavior we also have to take it after
> acquiring locks.  And it looks like locking is intertwined with a
> bunch of other parse analysis tasks that might require a snapshot to
> have been taken first.  Whee.

Yeah.  I think that a good solution to this would involve guaranteeing
that the execution snapshot is not taken until we have all locks that
are going to be taken on the tables.  Which is likely to involve a fair
amount of refactoring, though I admit I've not looked at details.

In any case, it's a mistake to think about this in isolation.  If we're
going to do something about redefining SnapshotNow to avoid its race
conditions, that's going to move the goalposts quite a lot.

Anyway, my feeling about it is that I don't want 9.2 to have an
intermediate behavior between the historical one and whatever we end up
designing to satisfy these concerns.  That's why I'm pressing for
reversion and not a band-aid fix in 9.2.  I certainly hope we can do
better going forward, but this is not looking like whatever we come up
with would be sane to back-patch.
        regards, tom lane

Re: change in LOCK behavior

From

Simon Riggs

Date:

11 October 2012, 19:48:01

On 11 October 2012 20:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> So we have to take the snapshot before you begin execution, but it
>> seems that to avoid surprising behavior we also have to take it after
>> acquiring locks.  And it looks like locking is intertwined with a
>> bunch of other parse analysis tasks that might require a snapshot to
>> have been taken first.  Whee.
>
> Yeah.  I think that a good solution to this would involve guaranteeing
> that the execution snapshot is not taken until we have all locks that
> are going to be taken on the tables.  Which is likely to involve a fair
> amount of refactoring, though I admit I've not looked at details.
>
> In any case, it's a mistake to think about this in isolation.  If we're
> going to do something about redefining SnapshotNow to avoid its race
> conditions, that's going to move the goalposts quite a lot.
>
> Anyway, my feeling about it is that I don't want 9.2 to have an
> intermediate behavior between the historical one and whatever we end up
> designing to satisfy these concerns.  That's why I'm pressing for
> reversion and not a band-aid fix in 9.2.  I certainly hope we can do
> better going forward, but this is not looking like whatever we come up
> with would be sane to back-patch.

Agreed, please revert.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: change in LOCK behavior

From

Ants Aasma

Date:

16 October 2012, 21:39:38

On Thu, Oct 11, 2012 at 7:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Maybe what we really need is to find a way to make taking a snapshot a
> lot cheaper, such that the whole need for this patch goes away.  We're
> not going to get far with the idea of making SnapshotNow MVCC-safe
> unless it becomes a lot cheaper to get an MVCC snapshot.  I recall some
> discussion of trying to reduce a snapshot to a WAL offset --- did that
> idea crash and burn, or is it still viable?

This was mostly covered in the cheaper snapshots thread. [1] Robert
decided to abandon the idea after concluding that the memory overhead
was untenable with very old snapshots. [2] I had a really hand-wavy
idea of lazily converting snapshots from sequence number based
snapshots to traditional list of xids snapshots to limit the overhead.
That idea was promptly shot down because in that incarnation it needed
snapshots to be stored in shared memory. [3] I have done some more
thinking on this topic, although I have to admit that it has been on
the backburner. It seems to me that the problems are all surmountable.

To recap shortly, the idea is to define visibility and snapshots
through commit sequence numbers (LSNs have problems due to async
commit). The tricky part is the datastructure to support fast
xid-to-csn lookup for visibility checks. To support visibility checks
enough information needs to be kept so that the oldest CSN based
snapshot can resolve its xmin-xmax range to csns. My idea currently is
to have two fixed size shared memory buffers and an overflow log. The
first ring buffer is a dense array mapping of xids to csns. The
overflow entries from the dense ring buffer are checked if they might
be invisible to any CSN based snapshots, and if so inserted into the
sparse buffer. The sparse buffer is a sorted array containing xid-csn
pairs that are still running or are concurrent with an active CSN
based snapshot. Once the sparse buffer is filled up, the smallest
xid-csn pairs are evicted to the CSN log. The long running CSN based
snapshots then need to read this log to build up the
SnapshotData->xip/subxip arrays. The backends can either discover that
their snapshots CSNs values have overflowed by checking the
appropriate horizon value, or be signaled via an interrupt to enable
CSN log cleanup ASAP.

I still have to work out some details on how to handle subtransaction
overflow, how to maintain reasonably fresh values for different
horizons and what are necessary ordering barriers to get lock-free
visibility checks. The idea currently seems workable and will make
taking snapshots really cheap, while the worst case maintenance
overhead is mostly shifted to sessions that acquire lots of writing
transactions and hold snapshots open for a long time.

If anyone is interested I can do a slightly longer write up detailing
what I have worked out so far.

Ants Aasma

[1]
http://archives.postgresql.org/message-id/CA%2BTgmoaAjiq%3Dd%3DkYt3qNj%2BUvi%2BMB-aRovCwr75Ca9egx-Ks9Ag%40mail.gmail.com
[2]
http://archives.postgresql.org/message-id/CA%2BTgmoYD6EhYy1Rb%2BSEuns5smreY1_3rAMeL%3D76rX8deijy56Q%40mail.gmail.com
[3] http://archives.postgresql.org/message-id/CA%2BCSw_uDfg2SBMicGNu13bpr2upbnVL_edoTbzvacR1FrNrZ1g%40mail.gmail.com
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Re: change in LOCK behavior

From

Tom Lane

Date:

26 November 2012, 15:57:38

Simon Riggs <simon@2ndQuadrant.com> writes:
> On 11 October 2012 20:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> So we have to take the snapshot before you begin execution, but it
>>> seems that to avoid surprising behavior we also have to take it after
>>> acquiring locks.  And it looks like locking is intertwined with a
>>> bunch of other parse analysis tasks that might require a snapshot to
>>> have been taken first.  Whee.

>> Yeah.  I think that a good solution to this would involve guaranteeing
>> that the execution snapshot is not taken until we have all locks that
>> are going to be taken on the tables.  Which is likely to involve a fair
>> amount of refactoring, though I admit I've not looked at details.
>> 
>> In any case, it's a mistake to think about this in isolation.  If we're
>> going to do something about redefining SnapshotNow to avoid its race
>> conditions, that's going to move the goalposts quite a lot.
>> 
>> Anyway, my feeling about it is that I don't want 9.2 to have an
>> intermediate behavior between the historical one and whatever we end up
>> designing to satisfy these concerns.  That's why I'm pressing for
>> reversion and not a band-aid fix in 9.2.  I certainly hope we can do
>> better going forward, but this is not looking like whatever we come up
>> with would be sane to back-patch.

> Agreed, please revert.

We have to do something about this one way or another before we can ship
9.2.2.  Is the consensus to revert this patch:
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d573e239f03506920938bf0be56c868d9c3416da
and if so, who's going to do the deed?
        regards, tom lane

Re: change in LOCK behavior

From

Robert Haas

Date:

26 November 2012, 16:00:38

On Mon, Nov 26, 2012 at 10:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 11 October 2012 20:43, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> So we have to take the snapshot before you begin execution, but it
>>>> seems that to avoid surprising behavior we also have to take it after
>>>> acquiring locks.  And it looks like locking is intertwined with a
>>>> bunch of other parse analysis tasks that might require a snapshot to
>>>> have been taken first.  Whee.
>
>>> Yeah.  I think that a good solution to this would involve guaranteeing
>>> that the execution snapshot is not taken until we have all locks that
>>> are going to be taken on the tables.  Which is likely to involve a fair
>>> amount of refactoring, though I admit I've not looked at details.
>>>
>>> In any case, it's a mistake to think about this in isolation.  If we're
>>> going to do something about redefining SnapshotNow to avoid its race
>>> conditions, that's going to move the goalposts quite a lot.
>>>
>>> Anyway, my feeling about it is that I don't want 9.2 to have an
>>> intermediate behavior between the historical one and whatever we end up
>>> designing to satisfy these concerns.  That's why I'm pressing for
>>> reversion and not a band-aid fix in 9.2.  I certainly hope we can do
>>> better going forward, but this is not looking like whatever we come up
>>> with would be sane to back-patch.
>
>> Agreed, please revert.
>
> We have to do something about this one way or another before we can ship
> 9.2.2.  Is the consensus to revert this patch:
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d573e239f03506920938bf0be56c868d9c3416da
> and if so, who's going to do the deed?

I was assuming you were going to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: change in LOCK behavior

From

Tom Lane

Date:

26 November 2012, 16:12:31

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Nov 26, 2012 at 10:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We have to do something about this one way or another before we can ship
>> 9.2.2.  Is the consensus to revert this patch:
>> http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d573e239f03506920938bf0be56c868d9c3416da
>> and if so, who's going to do the deed?

> I was assuming you were going to do it.

OK, will take care of it.
        regards, tom lane