Re: BUG #4925: "select ... for update" doesn't affect rows from sub-query - Mailing list pgsql-bugs

From Robert Haas
Subject Re: BUG #4925: "select ... for update" doesn't affect rows from sub-query
Date
Msg-id 603c8f070907291954ydc9d467m21499dfbf824ee17@mail.gmail.com
Whole thread Raw
In response to BUG #4925: "select ... for update" doesn't affect rows from sub-query  ("Steve Caligo" <steve.caligo@ctie.etat.lu>)
Responses Re: BUG #4925: "select ... for update" doesn't affect rows from sub-query  ("Steve Caligo" <Steve.Caligo@ctie.etat.lu>)
List pgsql-bugs
On Thu, Jul 16, 2009 at 12:34 PM, Steve Caligo<steve.caligo@ctie.etat.lu> w=
rote:
>
> The following bug has been logged online:
>
> Bug reference: =A0 =A0 =A04925
> Logged by: =A0 =A0 =A0 =A0 =A0Steve Caligo
> Email address: =A0 =A0 =A0steve.caligo@ctie.etat.lu
> PostgreSQL version: 8.3.7 and 8.4.0
> Operating system: =A0 Archlinux and Gentoo 8.3.7, Gentoo 8.4.0
> Description: =A0 =A0 =A0 =A0"select ... for update" doesn't affect rows f=
rom
> sub-query
> Details:
>
> While trying to guarantee data consistency when doing concurrent processi=
ng,
> I stumbled upon your cautions mentionned in your documentation (especially
> "limit" ... "for update"):
> http://www.postgresql.org/docs/8.4/static/sql-select.html
>
> I tried working around this limitation and the statement on the same page
> seemed promising to me:
> "If FOR UPDATE or FOR SHARE is applied to a view or sub-query, it affects
> all tables used in the view or sub-query."
>
> But unfortunately the latter statement doesn't seem to be true and the
> subquery isn't protected by row locks, as the following examples show.
>
> 1) INITIAL SITUATION
>
> Create a simple table with some data. No constraints, no indexes, just the
> bare minimum:
>
> create table test (
> =A0 =A0 =A0 =A0id integer,
> =A0 =A0 =A0 =A0name varchar(10),
> =A0 =A0 =A0 =A0c integer
> );
>
> insert into test values
> =A0 =A0 =A0 =A0(1, 'test1', 0),
> =A0 =A0 =A0 =A0(2, 'test2', 0),
> =A0 =A0 =A0 =A0(3, 'test3', 0),
> =A0 =A0 =A0 =A0(4, 'test4', 0),
> =A0 =A0 =A0 =A0(5, 'test5', 0)
> ;
>
>
> 2) FIRST TRY, USING "UPDATE WHERE ID =3D ( SELECT )"
> The goal is to have two clients set their unique ID to a single/different
> row from the table. First, using "limit" in a slightly different way:
>
> 1=3D> begin transaction;
> 2=3D> begin transaction;
>
> 1=3D> update test set c =3D 1 where id =3D (
> =A0 =A0 =A0 =A0select id from test where c =3D 0 order by id limit 1
> )
> ; -- updates row id=3D1
> 2=3D> update test set c =3D 2 where id =3D (
> =A0 =A0 =A0 =A0select id from test where c =3D 0 order by id limit 1
> )
> ; -- forced to wait on lock
>
> 1=3D> commit; -- client #2 continues
> 1=3D> select * from test where id =3D 1;
> =A0id | name =A0| c
> ----+-------+---
> =A01 | test1 | 1
> (1 row)
>
> 2=3D> commit;
> 2=3D> select * from test where id =3D 1;
> =A0id | name =A0| c
> ----+-------+---
> =A01 | test1 | 2
> (1 row)
>
> 1=3D> select * from test where id =3D 1;
> =A0id | name =A0| c
> ----+-------+---
> =A01 | test1 | 2
> (1 row)
>
>
> Conclusion: didn't work.
> Probably because the subquery is executed before the update and not affec=
ted
> by row locking.
>
> Expected behaviour: one client update one row to c=3D1 and the other clie=
nt
> updates a different row to c=3D2.

This is pretty clearly NOT the situation described in the
documentation.  There is no FOR UPDATE or FOR SHARE anywhere in this
query.  You could argue that we should treat an UPDATE statement as
applying an implicit FOR UPDATE to any subqueries found therein, but
that has nothing to do with whether the current behavior matches the
documentation; it's a discussion of whether the current behavior is
good or bad.

> 3) SECOND TRY, PROTECTING THE UPDATE BY AN ADDITIONAL "SELECT ... FOR
> UPDATE" AND AVOIDING THE "LIMIT"
>
> 1=3D> begin transaction;
> 2=3D> begin transaction;
> 1=3D> select id from test where id =3D (
> =A0 =A0 =A0 =A0select min(id) from test where c =3D 0
> ) for update;
> =A0id
> ----
> =A02
> (1 row)
>
> 2=3D> select id from test where id =3D (
> =A0 =A0 =A0 =A0select min(id) from test where c =3D 0
> ) for update; -- forced to wait on lock
>
> 1=3D> update test set c =3D 1 where id =3D 2;
> 1=3D> commit; -- client #2 continues:
> 2=3D> -- client #2 outputs:
> =A0id
> ----
> =A02
> (1 row)
>
> 2=3D> select * from test where id =3D 2;
> =A0id | name =A0| c
> ----+-------+---
> =A02 | test2 | 1
> (1 row)
> 2=3D> -- now this isn't what we initially asked for, let's just repeat the
> query once more:
>
> 2=3D> select id from test where id =3D ( select min(id) from test where c=
 =3D 0 )
> for update;
> =A0id
> ----
> =A03
> (1 row)
>
>
> Conclusion: didn't work.
> The situation one ends up in is one that contradicts your above statement,
> but also seemingly violates the "I" in ACID. In 3), client #2 is clearly
> affected by the actions of client #1. While serialized transactions or fu=
ll
> table locks would avoid this race condition, it either requires large
> changes in the application or impacts performance during contention.

This is pretty weird behavior, and I am among those who think it
sucks.  But it is documented.

http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALI=
ZABILITY

An interesting fact is that if you stick another "for update" into the
subquery here, the command will fail utterly, with the following error
message:

ERROR:  SELECT FOR UPDATE/SHARE is not allowed with aggregate functions

> Adding an additional "c =3D 0" to the main query of 3) of course suppress=
es
> the row from the concurrent update, but it shouldn't have been returned w=
ith
> a value of "c =3D 1" because the transaction #2 started prior to the upda=
te
> statement of #1.

...Robert

pgsql-bugs by date:

Previous
From: Scott Mead
Date:
Subject: Re: BUG #4950: Problem in Job scheduling in Advanced postgre sql
Next
From: Robert Haas
Date:
Subject: Re: BUG #4925: "select ... for update" doesn't affect rows from sub-query