Thread: Insufficient locking for ALTER DEFAULT PRIVILEGES

Insufficient locking for ALTER DEFAULT PRIVILEGES

From

Vik Fearing

Date:

19 June 2015, 23:09:04

I came across the following bug this week:

Session 0:
begin;
create schema bug;
alter default privileges in schema bug grant all on tables to postgres;
commit;

Session 1:
begin;
alter default privileges in schema bug grant all on tables to postgres;

Session 2:
alter default privileges in schema bug grant all on tables to postgres;
<hangs>

Session 1:
commit;

Session 2:
ERROR:  tuple concurrently updated
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From

Alvaro Herrera

Date:

20 June 2015, 18:18:40

Vik Fearing wrote:

> Session 1:
> begin;
> alter default privileges in schema bug grant all on tables to postgres;
> 
> Session 2:
> alter default privileges in schema bug grant all on tables to postgres;
> <hangs>
> 
> Session 1:
> commit;
> 
> Session 2:
> ERROR:  tuple concurrently updated

So it turns out we don't have any locking here at all.  I don't believe
we have it for all object types, but in most cases it's not as obnoxious
as this one.  But at least for relations we have some nice coding in
RangeVarGetRelidExtended and RangeVarGetAndCheckCreationNamespace that
protect things.

I was thinking of adding some similar locking-and-looping logic in
StoreDefaultACL: grab the tuple from catalogs, LockDatabaseObject()
using the OID of the tuple so obtained; check the sinval counter like
RangeVarGetRelidExtended, if no change we're okay; if it changed, go
grab the OID once again, and if it changed, restart from the top; if
the OID did not change, then we're done.

This sounds complicated, but it's actually reasonably straightforward
and contained within a single routine.

But then, it doesn't handle the case that two transactions try to start
a row for the same combination at the same time.  One of them is going
to get the heap_insert() call to succeed and the other one is going to
get an ugly error message.

I wonder if I'm over-thinking this.  Other thoughts?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From

Alvaro Herrera

Date:

20 June 2015, 18:35:32

Alvaro Herrera wrote:

> So it turns out we don't have any locking here at all.  I don't believe
> we have it for all object types, but in most cases it's not as obnoxious
> as this one.  But at least for relations we have some nice coding in
> RangeVarGetRelidExtended and RangeVarGetAndCheckCreationNamespace that
> protect things.

Now that I actually check with a non-relation object, I see pretty much
the same error.  So probably if instead of some narrow bug fix what we
need is some general solution for all object types.  I know this has
been discussed a number of times ...  Anyway I see now that we should
not consider this a backpatchable bug fix, and I'm not doing the coding
either, at least not now.

Session 1:

alvherre=# begin;
BEGIN
alvherre=# create or replace function f() returns int language plpgsql strict as $$ begin return 2; end; $$;
CREATE FUNCTION

Session 2:
alvherre=# create or replace function f() returns int language plpgsql strict as $$ begin return 3; end; $$;
<blocks>

Session 1:

alvherre=# commit;
COMMIT

Session 2:
ERROR:  tuple concurrently updated

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From

Alvaro Herrera

Date:

21 June 2015, 14:45:43

Alvaro Herrera wrote:

> Now that I actually check with a non-relation object, I see pretty much
> the same error.  So probably if instead of some narrow bug fix what we
> need is some general solution for all object types.  I know this has
> been discussed a number of times ...  Anyway I see now that we should
> not consider this a backpatchable bug fix, and I'm not doing the coding
> either, at least not now.

Discussed this with a couple of 2ndQ colleagues and it became evident
that MVCC catalog scans probably make this problem much more prominent.
So historical branches are not affected all that much, but it's a real
issue on 9.4+.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From

Andres Freund

Date:

21 June 2015, 15:12:03

On 2015-06-21 11:45:24 -0300, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> 
> > Now that I actually check with a non-relation object, I see pretty much
> > the same error.  So probably if instead of some narrow bug fix what we
> > need is some general solution for all object types.  I know this has
> > been discussed a number of times ...  Anyway I see now that we should
> > not consider this a backpatchable bug fix, and I'm not doing the coding
> > either, at least not now.
> 
> Discussed this with a couple of 2ndQ colleagues and it became evident
> that MVCC catalog scans probably make this problem much more prominent.
> So historical branches are not affected all that much, but it's a real
> issue on 9.4+.

Hm. I don't see how those would make a marked difference. The snapshot
for catalogs scan are taken afresh for each scan (unless
cached). There'll probably be some difference, but it'll be small.

Andres

Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From

Amit Kapila

Date:

22 June 2015, 03:36:47

On Sat, Jun 20, 2015 at 10:57 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Vik Fearing wrote:
>
> > Session 1:
> > begin;
> > alter default privileges in schema bug grant all on tables to postgres;
> >
> > Session 2:
> > alter default privileges in schema bug grant all on tables to postgres;
> > <hangs>
> >
> > Session 1:
> > commit;
> >
> > Session 2:
> > ERROR: tuple concurrently updated
>
> So it turns out we don't have any locking here at all. I don't believe
> we have it for all object types, but in most cases it's not as obnoxious
> as this one. But at least for relations we have some nice coding in
> RangeVarGetRelidExtended and RangeVarGetAndCheckCreationNamespace that
> protect things.
>
> I was thinking of adding some similar locking-and-looping logic in
> StoreDefaultACL: grab the tuple from catalogs, LockDatabaseObject()
> using the OID of the tuple so obtained; check the sinval counter like
> RangeVarGetRelidExtended, if no change we're okay; if it changed, go
> grab the OID once again, and if it changed, restart from the top; if
> the OID did not change, then we're done.
>
> This sounds complicated, but it's actually reasonably straightforward
> and contained within a single routine.
>
> But then, it doesn't handle the case that two transactions try to start
> a row for the same combination at the same time. One of them is going
> to get the heap_insert() call to succeed and the other one is going to
> get an ugly error message.
>
> I wonder if I'm over-thinking this. Other thoughts?
>

This problem won't occur for dml statements (2 sessions trying to update

the tuple in same way as described in above example) and the reason is

that ExecUpdate() has EvalPlanQual() mechanism to avoid this, I am

talking about below code:

ExecUpdate()

{

heap_update();

case HeapTupleUpdated:

..
if (!ItemPointerEquals(tupleid, &hufd.ctid))
..
epqslot = EvalPlanQual(estate,

}

Now for the similar case in simple_heap_update(), we throw error, can't

we use similar coding pattern (as in ExecUpdate) in simple_heap_update()?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From

Robert Haas

Date:

23 June 2015, 13:50:00

On Sun, Jun 21, 2015 at 11:11 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-06-21 11:45:24 -0300, Alvaro Herrera wrote:
>> Alvaro Herrera wrote:
>> > Now that I actually check with a non-relation object, I see pretty much
>> > the same error.  So probably if instead of some narrow bug fix what we
>> > need is some general solution for all object types.  I know this has
>> > been discussed a number of times ...  Anyway I see now that we should
>> > not consider this a backpatchable bug fix, and I'm not doing the coding
>> > either, at least not now.
>>
>> Discussed this with a couple of 2ndQ colleagues and it became evident
>> that MVCC catalog scans probably make this problem much more prominent.
>> So historical branches are not affected all that much, but it's a real
>> issue on 9.4+.
>
> Hm. I don't see how those would make a marked difference. The snapshot
> for catalogs scan are taken afresh for each scan (unless
> cached). There'll probably be some difference, but it'll be small.

Yeah, I think the same.  If those changes introduced a problem we
didn't have before, I'd like to see a reproducible test case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company