Thread: Insufficient locking for ALTER DEFAULT PRIVILEGES

Insufficient locking for ALTER DEFAULT PRIVILEGES

From
Vik Fearing
Date:
I came across the following bug this week:

Session 0:
begin;
create schema bug;
alter default privileges in schema bug grant all on tables to postgres;
commit;

Session 1:
begin;
alter default privileges in schema bug grant all on tables to postgres;

Session 2:
alter default privileges in schema bug grant all on tables to postgres;
<hangs>

Session 1:
commit;

Session 2:
ERROR:  tuple concurrently updated
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From
Alvaro Herrera
Date:
Vik Fearing wrote:

> Session 1:
> begin;
> alter default privileges in schema bug grant all on tables to postgres;
> 
> Session 2:
> alter default privileges in schema bug grant all on tables to postgres;
> <hangs>
> 
> Session 1:
> commit;
> 
> Session 2:
> ERROR:  tuple concurrently updated

So it turns out we don't have any locking here at all.  I don't believe
we have it for all object types, but in most cases it's not as obnoxious
as this one.  But at least for relations we have some nice coding in
RangeVarGetRelidExtended and RangeVarGetAndCheckCreationNamespace that
protect things.

I was thinking of adding some similar locking-and-looping logic in
StoreDefaultACL: grab the tuple from catalogs, LockDatabaseObject()
using the OID of the tuple so obtained; check the sinval counter like
RangeVarGetRelidExtended, if no change we're okay; if it changed, go
grab the OID once again, and if it changed, restart from the top; if
the OID did not change, then we're done.

This sounds complicated, but it's actually reasonably straightforward
and contained within a single routine.

But then, it doesn't handle the case that two transactions try to start
a row for the same combination at the same time.  One of them is going
to get the heap_insert() call to succeed and the other one is going to
get an ugly error message.

I wonder if I'm over-thinking this.  Other thoughts?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> So it turns out we don't have any locking here at all.  I don't believe
> we have it for all object types, but in most cases it's not as obnoxious
> as this one.  But at least for relations we have some nice coding in
> RangeVarGetRelidExtended and RangeVarGetAndCheckCreationNamespace that
> protect things.

Now that I actually check with a non-relation object, I see pretty much
the same error.  So probably if instead of some narrow bug fix what we
need is some general solution for all object types.  I know this has
been discussed a number of times ...  Anyway I see now that we should
not consider this a backpatchable bug fix, and I'm not doing the coding
either, at least not now.

Session 1:

alvherre=# begin;
BEGIN
alvherre=# create or replace function f() returns int language plpgsql strict as $$ begin return 2; end; $$;
CREATE FUNCTION

Session 2:
alvherre=# create or replace function f() returns int language plpgsql strict as $$ begin return 3; end; $$;
<blocks>

Session 1:

alvherre=# commit;
COMMIT

Session 2:
ERROR:  tuple concurrently updated



-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From
Alvaro Herrera
Date:
Alvaro Herrera wrote:

> Now that I actually check with a non-relation object, I see pretty much
> the same error.  So probably if instead of some narrow bug fix what we
> need is some general solution for all object types.  I know this has
> been discussed a number of times ...  Anyway I see now that we should
> not consider this a backpatchable bug fix, and I'm not doing the coding
> either, at least not now.

Discussed this with a couple of 2ndQ colleagues and it became evident
that MVCC catalog scans probably make this problem much more prominent.
So historical branches are not affected all that much, but it's a real
issue on 9.4+.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From
Andres Freund
Date:
On 2015-06-21 11:45:24 -0300, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> 
> > Now that I actually check with a non-relation object, I see pretty much
> > the same error.  So probably if instead of some narrow bug fix what we
> > need is some general solution for all object types.  I know this has
> > been discussed a number of times ...  Anyway I see now that we should
> > not consider this a backpatchable bug fix, and I'm not doing the coding
> > either, at least not now.
> 
> Discussed this with a couple of 2ndQ colleagues and it became evident
> that MVCC catalog scans probably make this problem much more prominent.
> So historical branches are not affected all that much, but it's a real
> issue on 9.4+.

Hm. I don't see how those would make a marked difference. The snapshot
for catalogs scan are taken afresh for each scan (unless
cached). There'll probably be some difference, but it'll be small.

Andres



Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From
Amit Kapila
Date:
On Sat, Jun 20, 2015 at 10:57 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> Vik Fearing wrote:
>
> > Session 1:
> > begin;
> > alter default privileges in schema bug grant all on tables to postgres;
> >
> > Session 2:
> > alter default privileges in schema bug grant all on tables to postgres;
> > <hangs>
> >
> > Session 1:
> > commit;
> >
> > Session 2:
> > ERROR:  tuple concurrently updated
>
> So it turns out we don't have any locking here at all.  I don't believe
> we have it for all object types, but in most cases it's not as obnoxious
> as this one.  But at least for relations we have some nice coding in
> RangeVarGetRelidExtended and RangeVarGetAndCheckCreationNamespace that
> protect things.
>
> I was thinking of adding some similar locking-and-looping logic in
> StoreDefaultACL: grab the tuple from catalogs, LockDatabaseObject()
> using the OID of the tuple so obtained; check the sinval counter like
> RangeVarGetRelidExtended, if no change we're okay; if it changed, go
> grab the OID once again, and if it changed, restart from the top; if
> the OID did not change, then we're done.
>
> This sounds complicated, but it's actually reasonably straightforward
> and contained within a single routine.
>
> But then, it doesn't handle the case that two transactions try to start
> a row for the same combination at the same time.  One of them is going
> to get the heap_insert() call to succeed and the other one is going to
> get an ugly error message.
>
> I wonder if I'm over-thinking this.  Other thoughts?
>

This problem won't occur for dml statements (2 sessions trying to update
the tuple in same way as described in above example) and the reason is
that ExecUpdate() has EvalPlanQual() mechanism to avoid this, I am
talking about below code:

ExecUpdate()
{
..
heap_update();

case HeapTupleUpdated:
..
if (!ItemPointerEquals(tupleid, &hufd.ctid))
..
epqslot = EvalPlanQual(estate,
..
}

Now for the similar case in simple_heap_update(), we throw error, can't
we use similar coding pattern (as in ExecUpdate) in simple_heap_update()?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Insufficient locking for ALTER DEFAULT PRIVILEGES

From
Robert Haas
Date:
On Sun, Jun 21, 2015 at 11:11 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-06-21 11:45:24 -0300, Alvaro Herrera wrote:
>> Alvaro Herrera wrote:
>> > Now that I actually check with a non-relation object, I see pretty much
>> > the same error.  So probably if instead of some narrow bug fix what we
>> > need is some general solution for all object types.  I know this has
>> > been discussed a number of times ...  Anyway I see now that we should
>> > not consider this a backpatchable bug fix, and I'm not doing the coding
>> > either, at least not now.
>>
>> Discussed this with a couple of 2ndQ colleagues and it became evident
>> that MVCC catalog scans probably make this problem much more prominent.
>> So historical branches are not affected all that much, but it's a real
>> issue on 9.4+.
>
> Hm. I don't see how those would make a marked difference. The snapshot
> for catalogs scan are taken afresh for each scan (unless
> cached). There'll probably be some difference, but it'll be small.

Yeah, I think the same.  If those changes introduced a problem we
didn't have before, I'd like to see a reproducible test case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company