Re: ALTER TABLE lock strength reduction patch is unsafe - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: ALTER TABLE lock strength reduction patch is unsafe
Date
Msg-id BANLkTinkzstqMRer81D_8y_Q+NrkwWt=gg@mail.gmail.com
Whole thread Raw
In response to Re: ALTER TABLE lock strength reduction patch is unsafe  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: ALTER TABLE lock strength reduction patch is unsafe
List pgsql-hackers
On Tue, Jun 21, 2011 at 4:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On Mon, Jun 20, 2011 at 10:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The ALTER TABLE patch
>>> has greatly expanded the scope of the issue, and that *is* a regression
>>> compared to prior releases.
>
>> I agree the scope for RELOID errors increased with my 9.1 patch. I'm
>> now happy with the locking patch (attached), which significantly
>> reduces the scope - back to the original error scope, in my testing.
>
>> I tried to solve both, but I think that's a step too far given the timing.
>
>> It seems likely that there will be objections to this patch.
>
> Yup, you're right.  Having read this patch, I have absolutely zero
> confidence in it.  It introduces some locks in random places, with no
> rhyme or reason that I can see.

There is something to be salvaged here and I have additional analysis
and suggestions. Please read all the way through.

We already allowed INHERIT/dis INHERIT with a lower lock level than
AccessExclusiveLock and have received no complaints to speak of. We
aren't discussing moving that operation to AccessExclusiveLock in the
light of this discussion, nor are we discussing TYPE OF operations. I
don't think its sensible to reset those operations to
AccessExclusiveLock.

The patch has extensive comments that explain the carefully analysed
and tested locations for the locking.

We take an ExclusiveLock on the RelationDefinition during an ALTER TABLE.
We take a ShareLock on CatCache and RelCache rebuilds.

The only time these can block is during an ALTER TABLE with a lower
lock than AccessExclusiveLock that occurs immediately after other DDL.
The locking has been placed to minimise the lock window for simple
ALTER TABLE statements.

> There is no reason to think that this
> is a complete solution, and considerable reason to think that it isn't
> (notably, the RELOID syscache is hardly the only one at risk).

The whole of the relcache rebuild is protected, not just access to
pg_class, so that is safe, no matter which catalog tables are
accessed.

Catcache does look more complex, so your fears here are worth looking
into. I was already doing that since my last post and have now
finished.

Based on the above, I think we should limit the patch to these areas:

1. Simple operations on pg_class and/or pg_attribute. This includes
                        case AT_SetStatistics:   ATTNAME catcache
            case AT_SetOptions:     ATTNAME catcache
            case AT_ResetOptions:  ATTNAME catcache
            case AT_SetStorage:    ATTNAME catcache
            case AT_SetRelOptions: RELOID catcache
            case AT_ResetRelOptions: RELOID catcache

2. FK VALIDATION which touches CONSTROID

Limiting the scope of the patch to those areas provides good benefit,
whilst at the same time limiting our risk footprint. Doing that means
we can evaluate the patch against those aspects.

For people that still think there is still risk, I've added a
parameter called "relaxed_ddl_locking" which defaults to "off". That
way people can use this feature in an informed way without exposing us
to support risks from its use.

[Patch needs minor docs and test documentation changes prior to commit]

> Worse,
> it's adding more locking in performance-critical places, which seems
> to me to severely degrade the argument for the original feature,
> namely that it was supposed to give us *less* locking.

I think we should be clear that this adds *one* lock/unlock for each
object for each session, ONCE only after each transaction that
executed a DDL statement against an object. So with 200 sessions, a
typical ALTER TABLE statement will cause 400 locks. The current lock
manager maxes out above 50,000 locks per second, so any comparative
analysis will show this is a minor blip on even a heavy workload.

It isn't in a performance critical place and the lock held in case of
a rebuild is a ShareLock so won't block, accept against an ALTER TABLE
statement.

This is a marked *reduction* in locking overhead, in comparison with
ALTER TABLE being an AccessExclusiveLock which would make the SQL
statement wait for potentially a very long time.

With relaxed_ddl_locking = off (default) there is zero effect.

So that is not a valid argument to refuse this patch.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: smallserial / serial2
Next
From: Tom Lane
Date:
Subject: Re: smallserial / serial2