Re: DDL Damage Assessment - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: DDL Damage Assessment
Date
Msg-id 20141002211141.GA28859@tamriel.snowman.net
Whole thread Raw
In response to Re: DDL Damage Assessment  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
* Peter Geoghegan (pg@heroku.com) wrote:
> On Thu, Oct 2, 2014 at 12:40 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > The downside of the 'explain' approach is that the script then has to be
> > modified to put 'explain' in front of everything and then you have to go
> > through each statement and consider it.  Having a 'dry-run' transaction
> > type which then produces a report at the end feels like it'd be both
> > easier to assess the overall implications, and less error-prone as you
> > don't have to prefex every statement with 'explain'.  It might even be
> > possible to have the local "view" of post-alter statements be available
> > inside of this 'dry-run' option- that is, if you add a column in the
> > transaction then the column exists to the following commands, so it
> > doesn't just error out.  Having 'explain <whatever>' wouldn't give you
> > that and so you really wouldn't be able to have whole scripts run by
> > just pre-pending each command with 'explain'.
>
> It's kind of tricky to implement a patch to figure this out ahead of
> time. Some of the actual lock acquisitions are well hidden, in terms
> of how the code is structured. In others cases, it may not even be
> possible to determine ahead of time exactly what locks will be taken.

I was thinking this would be a new kind of transaction and we'd have to
teach parts of the system about it- yes, that's pretty invasive, but
it's at least one approach to consider.

> As Harold mentioned, another idea along the same lines would be to
> decorate DDL with a NOWAIT "no locking assertion" and/or "no rewrite
> assertion".  Basically, if this DDL (or perhaps any DDL, if this is
> implemented as a GUC instead) necessitates a table rewrite (and
> requires an AccessExclusiveLock), throw an error. That's the case that
> most people care about.

The problem I see with this approach is outlined above.  I agree that it
may be independently valuable, but I don't see it as being a solution to
the issue.

> This may not even be good enough, though. Consider:
>
> Session 1 is a long running transaction. Maybe it's a spurious
> idle-in-transaction situation, but it could also be totally
> reasonable. It holds an AccessShareLock on some relation, as long
> running transactions are inclined to do.
>
> Session 2 is our migration. It needs an AccessExclusiveLock to ALTER
> TABLE on the same relation (or whatever). But it doesn't need a
> rewrite, which is good. It comes along and attempts to acquire the
> lock, blocking on session 1.
>
> Session 3 is an innocent bystander. It goes to query the same table in
> an ordinary, routine way - a SELECT statement. Even though session 2's
> lock is not granted yet, session 3 is not at liberty to skip the queue
> and get its own AccessShareLock. The effect is about the same as if
> session 2 did need to hold an AccessExclusiveLock for ages: read
> queries block for a long time. And yet, in theory session 2's impact
> on production should not be minimal, if we consider something like
> EXPLAIN output.
>
> Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a
> particularly bad case. NOWAIT might be the wrong thing for DDL
> generally.

I agree that this is a concern, but this feels to me like a next-step
over top of the "assess the locks required" transaction type which I am
trying to outline.  Specifically, having a way to take the report of
what locks are going to be required and then actaully attempt to acquire
them all (or fail if any can't be granted immediately) would be a
natural next step and a way to start off the actual migration script-
either all get acquired and the script runs to completion, or a lock
isn't granted and the whole thing fails immediately without anything
actually being done.
Thanks,
    Stephen

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: DDL Damage Assessment
Next
From: Andres Freund
Date:
Subject: Re: TAP test breakage on MacOS X