Thread: DDL Damage Assessment

DDL Damage Assessment

From
Dimitri Fontaine
Date:
Hi fellow hackers,

I would like to work on a new feature allowing our users to assess the
amount of trouble they will run into when running a DDL script on their
production setups, *before* actually getting their services down.

The main practical example I can offer here is the ALTER TABLE command.
Recent releases are including very nice optimisations to it, so much so
that it's becoming increasingly hard to answer some very basic
questions:
 - what kind of locks will be taken? (exclusive, shared) - on what objects? (foreign keys, indexes, sequences, etc) -
willthe table have to be rewritten? the indexes? 

Of course the docs are answering parts of those, but in particular the
table rewriting rules are complex enough that “accidental DBAs” will
fail to predict if the target data type is binary coercible to the
current one.

Questions:
1. Do you agree that a systematic way to report what a DDL command (or   script, or transaction) is going to do on your
productiondatabase   is a feature we should provide to our growing user base? 
2. What do you think such a feature should look like?
3. Does it make sense to support the whole set of DDL commands from the   get go (or ever) when most of them are only
takinglocks in their   own pg_catalog entry anyway? 

Provided that we are able to converge towards a common enough answer to
those questions, I propose to hack my way around and send patches to
have it (the common answer) available in the next PostgreSQL release.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: DDL Damage Assessment

From
Fabrízio de Royes Mello
Date:


On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
>
> Hi fellow hackers,
>
> I would like to work on a new feature allowing our users to assess the
> amount of trouble they will run into when running a DDL script on their
> production setups, *before* actually getting their services down.
>
> The main practical example I can offer here is the ALTER TABLE command.
> Recent releases are including very nice optimisations to it, so much so
> that it's becoming increasingly hard to answer some very basic
> questions:
>
>   - what kind of locks will be taken? (exclusive, shared)
>   - on what objects? (foreign keys, indexes, sequences, etc)
>   - will the table have to be rewritten? the indexes?
>
> Of course the docs are answering parts of those, but in particular the
> table rewriting rules are complex enough that “accidental DBAs” will
> fail to predict if the target data type is binary coercible to the
> current one.
>
> Questions:
>
>  1. Do you agree that a systematic way to report what a DDL command (or
>     script, or transaction) is going to do on your production database
>     is a feature we should provide to our growing user base?
>
>  2. What do you think such a feature should look like?
>
>  3. Does it make sense to support the whole set of DDL commands from the
>     get go (or ever) when most of them are only taking locks in their
>     own pg_catalog entry anyway?
>
> Provided that we are able to converge towards a common enough answer to
> those questions, I propose to hack my way around and send patches to
> have it (the common answer) available in the next PostgreSQL release.
>

What you are proposing is some kind of "dry-run" with verbose output?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello

Re: DDL Damage Assessment

From
Claudio Freire
Date:
On Thu, Oct 2, 2014 at 1:46 PM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
> On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr>
> wrote:
>>
>> Hi fellow hackers,
>>
>> I would like to work on a new feature allowing our users to assess the
>> amount of trouble they will run into when running a DDL script on their
>> production setups, *before* actually getting their services down.
>>
>> The main practical example I can offer here is the ALTER TABLE command.
>> Recent releases are including very nice optimisations to it, so much so
>> that it's becoming increasingly hard to answer some very basic
>> questions:
>>
>>   - what kind of locks will be taken? (exclusive, shared)
>>   - on what objects? (foreign keys, indexes, sequences, etc)
>>   - will the table have to be rewritten? the indexes?
>>
>> Of course the docs are answering parts of those, but in particular the
>> table rewriting rules are complex enough that “accidental DBAs” will
>> fail to predict if the target data type is binary coercible to the
>> current one.
>>
>> Questions:
>>
>>  1. Do you agree that a systematic way to report what a DDL command (or
>>     script, or transaction) is going to do on your production database
>>     is a feature we should provide to our growing user base?
>>
>>  2. What do you think such a feature should look like?
>>
>>  3. Does it make sense to support the whole set of DDL commands from the
>>     get go (or ever) when most of them are only taking locks in their
>>     own pg_catalog entry anyway?
>>
>> Provided that we are able to converge towards a common enough answer to
>> those questions, I propose to hack my way around and send patches to
>> have it (the common answer) available in the next PostgreSQL release.
>>
>
> What you are proposing is some kind of "dry-run" with verbose output?


EXPLAIN ALTER TABLE ?



Re: DDL Damage Assessment

From
Harold Giménez
Date:
I think the main issue is when a table rewrite is triggered on a DDL
command on a large table, as this is what frequently leads to
unavailability. The idea of introducing a NOREWRITE keyword to DDL
commands then came up (credit: Peter Geoghegan). When the NOREWRITE
keyword is used and the DDL statement would rewrite the table, the
command errors and exits.

This would allow ORM and framework authors to include the NOREWRITE
option by default, only to be disabled on a per-statement basis by the
developer, once they have assessed that it may be safe or otherwise
they still want to proceed with this. The workflow for an app
developer then becomes:

* Write offending data migration (eg: add a column with a NOT NULL
constraint and default value)
* Test it locally, either by running automated test suite or running on staging
* See that it fails because of NOREWRITE option
* Assess situation. If it's a small table, or I still want to ignore,
override the option. Or rewrite migration to avoid rewrite.
* Repeat

I like this a lot just because it's simple, limited in scope, and can
be easily integrated into ORMs saving users hours of downtime and
frustration.

Thoughts?

On Thu, Oct 2, 2014 at 9:46 AM, Fabrízio de Royes Mello
<fabriziomello@gmail.com> wrote:
>
>
> On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr>
> wrote:
>>
>> Hi fellow hackers,
>>
>> I would like to work on a new feature allowing our users to assess the
>> amount of trouble they will run into when running a DDL script on their
>> production setups, *before* actually getting their services down.
>>
>> The main practical example I can offer here is the ALTER TABLE command.
>> Recent releases are including very nice optimisations to it, so much so
>> that it's becoming increasingly hard to answer some very basic
>> questions:
>>
>>   - what kind of locks will be taken? (exclusive, shared)
>>   - on what objects? (foreign keys, indexes, sequences, etc)
>>   - will the table have to be rewritten? the indexes?
>>
>> Of course the docs are answering parts of those, but in particular the
>> table rewriting rules are complex enough that “accidental DBAs” will
>> fail to predict if the target data type is binary coercible to the
>> current one.
>>
>> Questions:
>>
>>  1. Do you agree that a systematic way to report what a DDL command (or
>>     script, or transaction) is going to do on your production database
>>     is a feature we should provide to our growing user base?
>>
>>  2. What do you think such a feature should look like?
>>
>>  3. Does it make sense to support the whole set of DDL commands from the
>>     get go (or ever) when most of them are only taking locks in their
>>     own pg_catalog entry anyway?
>>
>> Provided that we are able to converge towards a common enough answer to
>> those questions, I propose to hack my way around and send patches to
>> have it (the common answer) available in the next PostgreSQL release.
>>
>
> What you are proposing is some kind of "dry-run" with verbose output?
>
> Regards,
>
> --
> Fabrízio de Royes Mello
> Consultoria/Coaching PostgreSQL
>>> Timbira: http://www.timbira.com.br
>>> Blog: http://fabriziomello.github.io
>>> Linkedin: http://br.linkedin.com/in/fabriziomello
>>> Twitter: http://twitter.com/fabriziomello
>>> Github: http://github.com/fabriziomello



Re: DDL Damage Assessment

From
Joe Conway
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/02/2014 11:30 AM, Dimitri Fontaine wrote:
> Questions:
> 
> 1. Do you agree that a systematic way to report what a DDL command
> (or script, or transaction) is going to do on your production
> database is a feature we should provide to our growing user base?

+1

I really like the idea and would find it useful/time-saving

> 2. What do you think such a feature should look like?

Elsewhere on this thread EXPLAIN was suggested. That makes a certain
amount of sense.

Maybe something like EXPLAIN IMPACT [...]

> 3. Does it make sense to support the whole set of DDL commands from
> the get go (or ever) when most of them are only taking locks in
> their own pg_catalog entry anyway?

Yes, I think it should cover all commands that can have an
availability impact.

Joe


- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJULYhAAAoJEDfy90M199hllYgP/0Du599FAMtGh+Z9PsT+XRp9
eodurnf3TjbN8euh+/KGUDDy9dh8xiyeVCbLwT1a7tbJpY5ziGKQFrFm/5yXteq1
vU58mrvx3RwsuWJiTxVKUUddJgBd/e1Q1n7CS/rDHMWyHHxW9PfVi4c/V/09NB/p
IZQP2lTiEJMZVRgemR53OokQarmrm08fN5HtaAbdwwA0y3q26lPWyx7y0DBiy1w2
2KMNQVxIHDYPby+HlDiJEwq8YxNEOuUcznfr2rICxX5iJxsoA13A04GwqDnzcPdL
W3eg+P4qV7TriytpGD1GgqkyAzqTuQNaOBcGY7pvWBhBjQiDPA0fGuNw/a7MeOco
9JTJeCjOygoSopnMFMXyF7epjZxReZtr88uC8nZDXC8wwkJIVDzhNQefhT1lTA+a
1MTcBwgFBq1lH5ttdOTKjbqD7+uPp7nxaMhD9GNgCLu/NZeMNo1O4HMjv9Ir6AyQ
etbkxcdOFuDaHmnrXnGOAFiM01JmorpVu6LBw4OjiD9KaO9X0gudHPo4LzocCxdB
6V2eTl95z/fKlG7uQOrNJ/S9y43FhFtgMZVsi0qIRqzu34ge7nxowjwyF9wcMZSq
CKCEk4NlzULGsivPF96eMxxtebFgvYp10AvRvckGuf9s3dZBmqHfI6PPT1J3qPyj
goq9yD/KpDfHLziqmZpr
=6cWT
-----END PGP SIGNATURE-----



Re: DDL Damage Assessment

From
"Joshua D. Drake"
Date:
On 10/02/2014 09:30 AM, Dimitri Fontaine wrote:

> Questions:
>
>   1. Do you agree that a systematic way to report what a DDL command (or
>      script, or transaction) is going to do on your production database
>      is a feature we should provide to our growing user base?

I would say it is late to the game and a great feature.

>
>   2. What do you think such a feature should look like?
>

I liked the other post that said: EXPLAIN <ALTER TABLE> or whatever. 
Heck it could even be useful to have EXPLAIN ANALZYE <ALTER TABLE> in 
case people want to run it on staging/test/dev environments to judge impact.

>   3. Does it make sense to support the whole set of DDL commands from the
>      get go (or ever) when most of them are only taking locks in their
>      own pg_catalog entry anyway?

I would think that introducing this incrementally makes sense.

JD



-- 
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, @cmdpromptinc
"If we send our children to Caesar for their education, we should             not be surprised when they come back as
Romans."



Re: DDL Damage Assessment

From
Stephen Frost
Date:
* Dimitri Fontaine (dimitri@2ndQuadrant.fr) wrote:
>  1. Do you agree that a systematic way to report what a DDL command (or
>     script, or transaction) is going to do on your production database
>     is a feature we should provide to our growing user base?

I definitely like the idea of such a 'dry-run' kind of operation to get
an idea of what would happen.

>  2. What do you think such a feature should look like?

My thinking is that this would be implemented as a new kind of read-only
transaction type.

>  3. Does it make sense to support the whole set of DDL commands from the
>     get go (or ever) when most of them are only taking locks in their
>     own pg_catalog entry anyway?

On the fence about this one..  In general, I'd say "yes", but I've not
looked at every case and I imagine there are DDL commands which really
aren't all that interesting for this case.

> Provided that we are able to converge towards a common enough answer to
> those questions, I propose to hack my way around and send patches to
> have it (the common answer) available in the next PostgreSQL release.

That feels a bit ambitious, given that we've not yet really nailed down
the feature definition yet, but I do like where you're going. :)
Thanks!
    Stephen

Re: DDL Damage Assessment

From
Stephen Frost
Date:
* Harold Giménez (harold@heroku.com) wrote:
> I think the main issue is when a table rewrite is triggered on a DDL
> command on a large table, as this is what frequently leads to
> unavailability. The idea of introducing a NOREWRITE keyword to DDL
> commands then came up (credit: Peter Geoghegan). When the NOREWRITE
> keyword is used and the DDL statement would rewrite the table, the
> command errors and exits.
>
> This would allow ORM and framework authors to include the NOREWRITE
> option by default, only to be disabled on a per-statement basis by the
> developer, once they have assessed that it may be safe or otherwise
> they still want to proceed with this. The workflow for an app
> developer then becomes:
>
> * Write offending data migration (eg: add a column with a NOT NULL
> constraint and default value)
> * Test it locally, either by running automated test suite or running on staging
> * See that it fails because of NOREWRITE option
> * Assess situation. If it's a small table, or I still want to ignore,
> override the option. Or rewrite migration to avoid rewrite.
> * Repeat
>
> I like this a lot just because it's simple, limited in scope, and can
> be easily integrated into ORMs saving users hours of downtime and
> frustration.
>
> Thoughts?

Not against it, but feels like an independent thing to consider- what
Devrim is suggesting is broader and encompasses the issue of locks,
which are certainly important to consider also.

In short, seems like having both would be worthwhile.
Thanks,
    Stephen

Re: DDL Damage Assessment

From
Stephen Frost
Date:
* Joshua D. Drake (jd@commandprompt.com) wrote:
> >  2. What do you think such a feature should look like?
>
> I liked the other post that said: EXPLAIN <ALTER TABLE> or whatever.
> Heck it could even be useful to have EXPLAIN ANALZYE <ALTER TABLE>
> in case people want to run it on staging/test/dev environments to
> judge impact.

The downside of the 'explain' approach is that the script then has to be
modified to put 'explain' in front of everything and then you have to go
through each statement and consider it.  Having a 'dry-run' transaction
type which then produces a report at the end feels like it'd be both
easier to assess the overall implications, and less error-prone as you
don't have to prefex every statement with 'explain'.  It might even be
possible to have the local "view" of post-alter statements be available
inside of this 'dry-run' option- that is, if you add a column in the
transaction then the column exists to the following commands, so it
doesn't just error out.  Having 'explain <whatever>' wouldn't give you
that and so you really wouldn't be able to have whole scripts run by
just pre-pending each command with 'explain'.

> >  3. Does it make sense to support the whole set of DDL commands from the
> >     get go (or ever) when most of them are only taking locks in their
> >     own pg_catalog entry anyway?
>
> I would think that introducing this incrementally makes sense.

Agreed.
Thanks,
    Stephen

Re: DDL Damage Assessment

From
Josh Berkus
Date:
> Questions:
> 
>  1. Do you agree that a systematic way to report what a DDL command (or
>     script, or transaction) is going to do on your production database
>     is a feature we should provide to our growing user base?

Yes.

>  2. What do you think such a feature should look like?

As with others, I think EXPLAIN is a good way to do this without adding
a keyword.  So you'd do:

EXPLAIN
ALTER TABLE ....

... and it would produce a bunch of actions, available in either text or
JSON formats.  For example:

{ locks : [ { lock_type: relation,  relation: table1,  lock type: ACCESS EXCLUSIVE },{ lock_type: transaction },{
lock_type:catalog,  catalogs: [pg_class, pg_attribute, pg_statistic],  lock_type: EXCLUSIVE } ]
 
}
{ writes : [{ object: relation files,  action: rewrite },{ object: catalogs  action: update }]

... etc.  Would need a lot of refinement, but you get the idea.

>  3. Does it make sense to support the whole set of DDL commands from the
>     get go (or ever) when most of them are only taking locks in their
>     own pg_catalog entry anyway?

Well, eventually we'd want to support all of them just to avoid having
things be wierd for users.  However, here's a priority order:

ALTER TABLE
CREATE TABLE
DROP TABLE
ALTER VIEW
CREATE VIEW
CREATE INDEX
DROP INDEX

... since all of the above can have unexpected secondary effects on
locking.  For example, if you create a table with FKs it will take an
ACCESS EXCLUSIVE lock on the FK targets.  And if you DROP a partition,
it takes an A.E. lock on the parent table.

> Provided that we are able to converge towards a common enough answer to
> those questions, I propose to hack my way around and send patches to
> have it (the common answer) available in the next PostgreSQL release.

Great!

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: DDL Damage Assessment

From
José Luis Tallón
Date:
On 10/02/2014 06:30 PM, Dimitri Fontaine wrote:
> Hi fellow hackers,
> [snip]
> Questions:
>
>  1. Do you agree that a systematic way to report what a DDL command (or
>     script, or transaction) is going to do on your production database
>     is a feature we should provide to our growing user base?

Yes, please

>  2. What do you think such a feature should look like?
EXPLAIN [(verbose, format)] [DDL_COMMAND]

as in:
EXPLAIN (verbose on, format text, impact on)   ALTER TABLE emp   ADD COLUMN foo2 jsonb NOT NULL DEFAULT '{}';

where the output would include something like:
   ...   EXCLUSIVE LOCK ON TABLE emp;           // due to "IMPACT ON"   REWRITE TABLE emp due to adding column foo2
(default='{}'::jsonb)  
 
// due to "VERBOSE on"   ...


> 3. Does it make sense to support the whole set of DDL commands from the
>     get go (or ever) when most of them are only taking locks in their
>     own pg_catalog entry anyway?

For completeness sake, yes.
But, unless the "impact" and "verbose" modifiers are specified, most
would be quite self-explanatory:

EXPLAIN (verbose on, impact on) TRUNCATE TABLE emp;   Execution plan:   -> EXCLUSIVE LOCK ON TABLE emp;       ....   ->
truncateindex: IIIIII (file=NNNNN)                      // NNNN
 
= relfilenode   -> truncate main fork: NNNNN (tablespace: TTTTT)            // NNNN
= relfilenode   -> truncate visibility map       ....   -> RELEASE LOCK ON TABLE emp;       ....   Summary: ZZZZZ pages
(MMM MB ) would be freed
 

versus a simple:
EXPLAIN TRUNCATE TABLE emp;   Execution plan:   -> truncate index: emp_pkey   -> truncate index: emp_foo2_idx   ->
truncaterelation emp
 


> Provided that we are able to converge towards a common enough answer to
> those questions, I propose to hack my way around and send patches to
> have it (the common answer) available in the next PostgreSQL release.
>

Sounds very good, indeed.
Count on me as tester :)


--   José Luis Tallón





Re: DDL Damage Assessment

From
Steven Lembark
Date:
> EXPLAIN ALTER TABLE ?

Good thing: People recognize it.
Bad thing:  People might not be able to tell the difference between           a DDL and DML result.

What about "EXPLAIN DDL ..."?

The extra keyword ("DDL") makes it a bit more explicit that the 
results are not comparable to the standard explain output.

-- 
Steven Lembark                                             3646 Flora Pl
Workhorse Computing                                   St Louis, MO 63110
lembark@wrkhors.com                                      +1 888 359 3508



Re: DDL Damage Assessment

From
Claudio Freire
Date:
On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Joshua D. Drake (jd@commandprompt.com) wrote:
>> >  2. What do you think such a feature should look like?
>>
>> I liked the other post that said: EXPLAIN <ALTER TABLE> or whatever.
>> Heck it could even be useful to have EXPLAIN ANALZYE <ALTER TABLE>
>> in case people want to run it on staging/test/dev environments to
>> judge impact.
>
> The downside of the 'explain' approach is that the script then has to be
> modified to put 'explain' in front of everything and then you have to go
> through each statement and consider it.  Having a 'dry-run' transaction
> type which then produces a report at the end feels like it'd be both
> easier to assess the overall implications, and less error-prone as you
> don't have to prefex every statement with 'explain'.  It might even be
> possible to have the local "view" of post-alter statements be available
> inside of this 'dry-run' option- that is, if you add a column in the
> transaction then the column exists to the following commands, so it
> doesn't just error out.  Having 'explain <whatever>' wouldn't give you
> that and so you really wouldn't be able to have whole scripts run by
> just pre-pending each command with 'explain'.

That sounds extremely complex. You'd have to implement the fake
columns, foreign keys, indexes, etc on most execution nodes, the
planner, and even system views.

IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
locks. I don't think you can simulate the side effects without locks,
so getting the local view of changes will be extremely difficult
unless you limit the scope considerably.



Re: DDL Damage Assessment

From
Peter Geoghegan
Date:
On Thu, Oct 2, 2014 at 12:40 PM, Stephen Frost <sfrost@snowman.net> wrote:
> The downside of the 'explain' approach is that the script then has to be
> modified to put 'explain' in front of everything and then you have to go
> through each statement and consider it.  Having a 'dry-run' transaction
> type which then produces a report at the end feels like it'd be both
> easier to assess the overall implications, and less error-prone as you
> don't have to prefex every statement with 'explain'.  It might even be
> possible to have the local "view" of post-alter statements be available
> inside of this 'dry-run' option- that is, if you add a column in the
> transaction then the column exists to the following commands, so it
> doesn't just error out.  Having 'explain <whatever>' wouldn't give you
> that and so you really wouldn't be able to have whole scripts run by
> just pre-pending each command with 'explain'.

It's kind of tricky to implement a patch to figure this out ahead of
time. Some of the actual lock acquisitions are well hidden, in terms
of how the code is structured. In others cases, it may not even be
possible to determine ahead of time exactly what locks will be taken.

As Harold mentioned, another idea along the same lines would be to
decorate DDL with a NOWAIT "no locking assertion" and/or "no rewrite
assertion".  Basically, if this DDL (or perhaps any DDL, if this is
implemented as a GUC instead) necessitates a table rewrite (and
requires an AccessExclusiveLock), throw an error. That's the case that
most people care about.

This may not even be good enough, though. Consider:

Session 1 is a long running transaction. Maybe it's a spurious
idle-in-transaction situation, but it could also be totally
reasonable. It holds an AccessShareLock on some relation, as long
running transactions are inclined to do.

Session 2 is our migration. It needs an AccessExclusiveLock to ALTER
TABLE on the same relation (or whatever). But it doesn't need a
rewrite, which is good. It comes along and attempts to acquire the
lock, blocking on session 1.

Session 3 is an innocent bystander. It goes to query the same table in
an ordinary, routine way - a SELECT statement. Even though session 2's
lock is not granted yet, session 3 is not at liberty to skip the queue
and get its own AccessShareLock. The effect is about the same as if
session 2 did need to hold an AccessExclusiveLock for ages: read
queries block for a long time. And yet, in theory session 2's impact
on production should not be minimal, if we consider something like
EXPLAIN output.

Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a
particularly bad case. NOWAIT might be the wrong thing for DDL
generally.

-- 
Peter Geoghegan



Re: DDL Damage Assessment

From
Peter Geoghegan
Date:
On Thu, Oct 2, 2014 at 1:37 PM, Peter Geoghegan <pg@heroku.com> wrote:
> And yet, in theory session 2's impact
> on production should not be minimal, if we consider something like
> EXPLAIN output.


"Should have been minimal", I mean.

-- 
Peter Geoghegan



Re: DDL Damage Assessment

From
Claudio Freire
Date:
On Thu, Oct 2, 2014 at 5:37 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Session 3 is an innocent bystander. It goes to query the same table in
> an ordinary, routine way - a SELECT statement. Even though session 2's
> lock is not granted yet, session 3 is not at liberty to skip the queue
> and get its own AccessShareLock. The effect is about the same as if
> session 2 did need to hold an AccessExclusiveLock for ages: read
> queries block for a long time. And yet, in theory session 2's impact
> on production should not be minimal, if we consider something like
> EXPLAIN output.

The explain would show the AccessExclusiveLock, so it would be enough
for a heads-up to kill all idle-in-transaction holding locks on the
target relation (if killable, or just wait).

Granted, it's something that's not easily automatable, whereas a nowait is.

However, rather than nowait, I'd prefer "cancellable" semantics, that
would cancel voluntarily if any other transaction requests a
conflicting lock, like autovacuum does.



Re: DDL Damage Assessment

From
Stephen Frost
Date:
* Claudio Freire (klaussfreire@gmail.com) wrote:
> On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > The downside of the 'explain' approach is that the script then has to be
> > modified to put 'explain' in front of everything and then you have to go
> > through each statement and consider it.  Having a 'dry-run' transaction
> > type which then produces a report at the end feels like it'd be both
> > easier to assess the overall implications, and less error-prone as you
> > don't have to prefex every statement with 'explain'.  It might even be
> > possible to have the local "view" of post-alter statements be available
> > inside of this 'dry-run' option- that is, if you add a column in the
> > transaction then the column exists to the following commands, so it
> > doesn't just error out.  Having 'explain <whatever>' wouldn't give you
> > that and so you really wouldn't be able to have whole scripts run by
> > just pre-pending each command with 'explain'.
>
> That sounds extremely complex. You'd have to implement the fake
> columns, foreign keys, indexes, etc on most execution nodes, the
> planner, and even system views.

Eh?  We have MVCC catalog access.

> IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
> locks. I don't think you can simulate the side effects without locks,

Why?  If you know the transaction is going to roll back and you only add
entries to the catalog which aren't visible to any other transactions
than your own, and you make sure that nothing you do actually writes
data out which is visible to other transactions..

> so getting the local view of changes will be extremely difficult
> unless you limit the scope considerably.

I agree that there may be complexities, but I'm not sure this is really
the issue..
Thanks,
    Stephen

Re: DDL Damage Assessment

From
Claudio Freire
Date:
On Thu, Oct 2, 2014 at 6:00 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Granted, it's something that's not easily automatable, whereas a nowait is.
>>
>> However, rather than nowait, I'd prefer "cancellable" semantics, that
>> would cancel voluntarily if any other transaction requests a
>> conflicting lock, like autovacuum does.
>
> I think the problem you'll have with NOWAIT is: you have an error from
> having to wait...what now? Do you restart? I imagine this would
> frequently result in what is effectively lock starvation. Any old
> AccessShareLock-er is going to make our migration tool restart. We'll
> never finish.

I've done that manually (throw the DDL, and cancel if it takes more
than a couple of seconds) on modest but relatively busy servers with
quite some success.



Re: DDL Damage Assessment

From
Peter Geoghegan
Date:
On Thu, Oct 2, 2014 at 1:52 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> The explain would show the AccessExclusiveLock, so it would be enough
> for a heads-up to kill all idle-in-transaction holding locks on the
> target relation (if killable, or just wait).

I think that there are very few problems with recognizing when an
AccessExclusiveLock is needed or not needed. The exceptions to the
rule that DDL needs such a lock are narrow enough that I have a hard
time believing that most people think about it, or even need to think
about it. I wish that wasn't the case, but it is.

> Granted, it's something that's not easily automatable, whereas a nowait is.
>
> However, rather than nowait, I'd prefer "cancellable" semantics, that
> would cancel voluntarily if any other transaction requests a
> conflicting lock, like autovacuum does.

I think the problem you'll have with NOWAIT is: you have an error from
having to wait...what now? Do you restart? I imagine this would
frequently result in what is effectively lock starvation. Any old
AccessShareLock-er is going to make our migration tool restart. We'll
never finish.

-- 
Peter Geoghegan



Re: DDL Damage Assessment

From
Claudio Freire
Date:
On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Claudio Freire (klaussfreire@gmail.com) wrote:
>> On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > The downside of the 'explain' approach is that the script then has to be
>> > modified to put 'explain' in front of everything and then you have to go
>> > through each statement and consider it.  Having a 'dry-run' transaction
>> > type which then produces a report at the end feels like it'd be both
>> > easier to assess the overall implications, and less error-prone as you
>> > don't have to prefex every statement with 'explain'.  It might even be
>> > possible to have the local "view" of post-alter statements be available
>> > inside of this 'dry-run' option- that is, if you add a column in the
>> > transaction then the column exists to the following commands, so it
>> > doesn't just error out.  Having 'explain <whatever>' wouldn't give you
>> > that and so you really wouldn't be able to have whole scripts run by
>> > just pre-pending each command with 'explain'.
>>
>> That sounds extremely complex. You'd have to implement the fake
>> columns, foreign keys, indexes, etc on most execution nodes, the
>> planner, and even system views.
>
> Eh?  We have MVCC catalog access.

And that needs locks, especially if you modify the underlying filesystem layout.

>> IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
>> locks. I don't think you can simulate the side effects without locks,
>
> Why?  If you know the transaction is going to roll back and you only add
> entries to the catalog which aren't visible to any other transactions
> than your own, and you make sure that nothing you do actually writes
> data out which is visible to other transactions..

But that's not the scope. If you want a dry-run of table-rewriting
DDL, or DDL interspersed with DML like:

alter table blargh add foo integer;
update blargh set foo = coalesce(bar, baz);

You really cannot hope not to have to write data. The above is also
the case with defaulted columns btw.

>> so getting the local view of changes will be extremely difficult
>> unless you limit the scope considerably.
>
> I agree that there may be complexities, but I'm not sure this is really
> the issue..

In essence, if you want MVCC catalog access without AEL, you're in for
a rough ride. I'm not as experienced with pg's core as you, so you
tell me, but I imagine it will be the case.



Re: DDL Damage Assessment

From
Peter Geoghegan
Date:
On Thu, Oct 2, 2014 at 2:04 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> I've done that manually (throw the DDL, and cancel if it takes more
> than a couple of seconds) on modest but relatively busy servers with
> quite some success.


Fair enough, but that isn't the same as NOWAIT. It's something we'd
have a hard time coming up with a general-purpose timeout for.

-- 
Peter Geoghegan



Re: DDL Damage Assessment

From
Stephen Frost
Date:
* Peter Geoghegan (pg@heroku.com) wrote:
> On Thu, Oct 2, 2014 at 12:40 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > The downside of the 'explain' approach is that the script then has to be
> > modified to put 'explain' in front of everything and then you have to go
> > through each statement and consider it.  Having a 'dry-run' transaction
> > type which then produces a report at the end feels like it'd be both
> > easier to assess the overall implications, and less error-prone as you
> > don't have to prefex every statement with 'explain'.  It might even be
> > possible to have the local "view" of post-alter statements be available
> > inside of this 'dry-run' option- that is, if you add a column in the
> > transaction then the column exists to the following commands, so it
> > doesn't just error out.  Having 'explain <whatever>' wouldn't give you
> > that and so you really wouldn't be able to have whole scripts run by
> > just pre-pending each command with 'explain'.
>
> It's kind of tricky to implement a patch to figure this out ahead of
> time. Some of the actual lock acquisitions are well hidden, in terms
> of how the code is structured. In others cases, it may not even be
> possible to determine ahead of time exactly what locks will be taken.

I was thinking this would be a new kind of transaction and we'd have to
teach parts of the system about it- yes, that's pretty invasive, but
it's at least one approach to consider.

> As Harold mentioned, another idea along the same lines would be to
> decorate DDL with a NOWAIT "no locking assertion" and/or "no rewrite
> assertion".  Basically, if this DDL (or perhaps any DDL, if this is
> implemented as a GUC instead) necessitates a table rewrite (and
> requires an AccessExclusiveLock), throw an error. That's the case that
> most people care about.

The problem I see with this approach is outlined above.  I agree that it
may be independently valuable, but I don't see it as being a solution to
the issue.

> This may not even be good enough, though. Consider:
>
> Session 1 is a long running transaction. Maybe it's a spurious
> idle-in-transaction situation, but it could also be totally
> reasonable. It holds an AccessShareLock on some relation, as long
> running transactions are inclined to do.
>
> Session 2 is our migration. It needs an AccessExclusiveLock to ALTER
> TABLE on the same relation (or whatever). But it doesn't need a
> rewrite, which is good. It comes along and attempts to acquire the
> lock, blocking on session 1.
>
> Session 3 is an innocent bystander. It goes to query the same table in
> an ordinary, routine way - a SELECT statement. Even though session 2's
> lock is not granted yet, session 3 is not at liberty to skip the queue
> and get its own AccessShareLock. The effect is about the same as if
> session 2 did need to hold an AccessExclusiveLock for ages: read
> queries block for a long time. And yet, in theory session 2's impact
> on production should not be minimal, if we consider something like
> EXPLAIN output.
>
> Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a
> particularly bad case. NOWAIT might be the wrong thing for DDL
> generally.

I agree that this is a concern, but this feels to me like a next-step
over top of the "assess the locks required" transaction type which I am
trying to outline.  Specifically, having a way to take the report of
what locks are going to be required and then actaully attempt to acquire
them all (or fail if any can't be granted immediately) would be a
natural next step and a way to start off the actual migration script-
either all get acquired and the script runs to completion, or a lock
isn't granted and the whole thing fails immediately without anything
actually being done.
Thanks,
    Stephen

Re: DDL Damage Assessment

From
Andres Freund
Date:
On 2014-10-02 17:03:59 -0400, Stephen Frost wrote:
> * Claudio Freire (klaussfreire@gmail.com) wrote:
> > On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > > The downside of the 'explain' approach is that the script then has to be
> > > modified to put 'explain' in front of everything and then you have to go
> > > through each statement and consider it.  Having a 'dry-run' transaction
> > > type which then produces a report at the end feels like it'd be both
> > > easier to assess the overall implications, and less error-prone as you
> > > don't have to prefex every statement with 'explain'.  It might even be
> > > possible to have the local "view" of post-alter statements be available
> > > inside of this 'dry-run' option- that is, if you add a column in the
> > > transaction then the column exists to the following commands, so it
> > > doesn't just error out.  Having 'explain <whatever>' wouldn't give you
> > > that and so you really wouldn't be able to have whole scripts run by
> > > just pre-pending each command with 'explain'.
> > 
> > That sounds extremely complex. You'd have to implement the fake
> > columns, foreign keys, indexes, etc on most execution nodes, the
> > planner, and even system views.
> 
> Eh?  We have MVCC catalog access.

So you want to modify the catalog without actually doing the
corresponding actions? That'll be heck of invasive. With changes all
over the backend. We'll need to remove error checks (like for the
existance of relfilenodes), remove rewriting, and such.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: DDL Damage Assessment

From
Andres Freund
Date:
On 2014-10-02 13:49:36 -0300, Claudio Freire wrote:
> EXPLAIN ALTER TABLE ?

I don't think that'll work - there's already EXPLAIN for some CREATE. At
least CREATE TABLE ... AS, CREATE VIEW ... AS and SELECT INTO.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: DDL Damage Assessment

From
Stephen Frost
Date:
* Claudio Freire (klaussfreire@gmail.com) wrote:
> On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost <sfrost@snowman.net> wrote:
> >> That sounds extremely complex. You'd have to implement the fake
> >> columns, foreign keys, indexes, etc on most execution nodes, the
> >> planner, and even system views.
> >
> > Eh?  We have MVCC catalog access.
>
> And that needs locks, especially if you modify the underlying filesystem layout.

And we wouldn't be doing that, certainly.  It's a dry-run.

> >> IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
> >> locks. I don't think you can simulate the side effects without locks,
> >
> > Why?  If you know the transaction is going to roll back and you only add
> > entries to the catalog which aren't visible to any other transactions
> > than your own, and you make sure that nothing you do actually writes
> > data out which is visible to other transactions..
>
> But that's not the scope. If you want a dry-run of table-rewriting
> DDL, or DDL interspersed with DML like:
>
> alter table blargh add foo integer;
> update blargh set foo = coalesce(bar, baz);
>
> You really cannot hope not to have to write data. The above is also
> the case with defaulted columns btw.

The point is to not write anything which is visible to other
transactions, which means we'd have to put DML into some different
'mode' which doesn't actually write where other processes might be
looking.  I'm not saying it's trivial to do, but I don't think it's
impossible either.  We might also be able to simply get away with
short-circuiting them and not actually writing anything (and the same
for reading data..).  What would probably be useful is to review actual
migration scripts and see if this would really work.  I know they'd work
for at least a subset of the migration scripts that I've dealt with
before, and also not all of them.

> >> so getting the local view of changes will be extremely difficult
> >> unless you limit the scope considerably.
> >
> > I agree that there may be complexities, but I'm not sure this is really
> > the issue..
>
> In essence, if you want MVCC catalog access without AEL, you're in for
> a rough ride. I'm not as experienced with pg's core as you, so you
> tell me, but I imagine it will be the case.

It's not clear to me what you're getting at as the 'rough' part,
exactly..
Thanks,
    Stephen

Re: DDL Damage Assessment

From
Stephen Frost
Date:
* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2014-10-02 17:03:59 -0400, Stephen Frost wrote:
> > > That sounds extremely complex. You'd have to implement the fake
> > > columns, foreign keys, indexes, etc on most execution nodes, the
> > > planner, and even system views.
> >
> > Eh?  We have MVCC catalog access.
>
> So you want to modify the catalog without actually doing the
> corresponding actions? That'll be heck of invasive. With changes all
> over the backend. We'll need to remove error checks (like for the
> existance of relfilenodes), remove rewriting, and such.

Yeah, I was getting at it being rather invasive earlier.  It really
depends on exactly what we'd support in this mode, which would depend on
just what would be invasive and what wouldn't, I expect.  I dislike the
idea of not being able to actually run a real migration script though as
anything else opens the very real possibility that the real script and
the 'explain' script don't do the same thing, making this capability not
nearly as useful..
Thanks,
    Stephen

Re: DDL Damage Assessment

From
Claudio Freire
Date:
On Thu, Oct 2, 2014 at 6:19 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Claudio Freire (klaussfreire@gmail.com) wrote:
>> On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> >> That sounds extremely complex. You'd have to implement the fake
>> >> columns, foreign keys, indexes, etc on most execution nodes, the
>> >> planner, and even system views.
>> >
>> > Eh?  We have MVCC catalog access.
>>
>> And that needs locks, especially if you modify the underlying filesystem layout.
>
> And we wouldn't be doing that, certainly.  It's a dry-run.

...

> (...) We might also be able to simply get away with
> short-circuiting them and not actually writing anything (and the same
> for reading data..).  What would probably be useful is to review actual
> migration scripts and see if this would really work.  I know they'd work
> for at least a subset of the migration scripts that I've dealt with
> before, and also not all of them.

I believe, for it to be reasonably intrusive, you'd have to abort at
the first need to actually read/write data.

Most of my migration scripts, especially the ones that would benefit
from this, require some of that, but that's just my personal common
practice, not a general case.

>> >> IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs
>> >> locks. I don't think you can simulate the side effects without locks,
>> >
>> > Why?  If you know the transaction is going to roll back and you only add
>> > entries to the catalog which aren't visible to any other transactions
>> > than your own, and you make sure that nothing you do actually writes
>> > data out which is visible to other transactions..
>>
>> But that's not the scope. If you want a dry-run of table-rewriting
>> DDL, or DDL interspersed with DML like:
>>
>> alter table blargh add foo integer;
>> update blargh set foo = coalesce(bar, baz);
>>
>> You really cannot hope not to have to write data. The above is also
>> the case with defaulted columns btw.
>
> The point is to not write anything which is visible to other
> transactions, which means we'd have to put DML into some different
> 'mode' which doesn't actually write where other processes might be
> looking.  I'm not saying it's trivial to do, but I don't think it's
> impossible either.  (...)

No, I don't think it's impossible either. Just very, very
time-consuming. Both in developing the patch and in its eventual
maintenance.

TBH, a separate read-only transaction like explain alter would also be
quite difficult to keep in sync with actual alter logic, unless it's
handled by the same code (unlikely in that form).

>> >> so getting the local view of changes will be extremely difficult
>> >> unless you limit the scope considerably.
>> >
>> > I agree that there may be complexities, but I'm not sure this is really
>> > the issue..
>>
>> In essence, if you want MVCC catalog access without AEL, you're in for
>> a rough ride. I'm not as experienced with pg's core as you, so you
>> tell me, but I imagine it will be the case.
>
> It's not clear to me what you're getting at as the 'rough' part,
> exactly..

A lot of work, touching most of the codebase, and hard to maintain in the end.

Unless you limit the scope, as you said, just touching the catalog,
not data, could be doable. But it would act as an implicit
"norewrite".



Re: DDL Damage Assessment

From
Alvaro Herrera
Date:
Peter Geoghegan wrote:

> Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a
> particularly bad case. NOWAIT might be the wrong thing for DDL
> generally.

I didn't realize we supported NOWAIT for SET TABLESPACE.  My hunch is
that if we have that, it should really be supported for anything that
does a table rewrite, or perhaps even anything that requires a full
table scan (such as adding a new CHECK constraint).

OTOH it does seem a wart to have NOWAIT in alter table generally.  You
can get the same effect by doing a LOCK TABLE .. NOWAIT and then ALTER
TABLE, right?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: DDL Damage Assessment

From
Alvaro Herrera
Date:
Dimitri Fontaine wrote:

> The main practical example I can offer here is the ALTER TABLE command.
> Recent releases are including very nice optimisations to it, so much so
> that it's becoming increasingly hard to answer some very basic
> questions:
> 
>   - what kind of locks will be taken? (exclusive, shared)
>   - on what objects? (foreign keys, indexes, sequences, etc)
>   - will the table have to be rewritten? the indexes?

Please give my DDL deparsing patch a look.  There is a portion there
about deparsing ALTER TABLE specifically; what it does is save a list of
subcommands, and for each of them we either report the OID of the object
affected (for example in ADD CONSTRAINT), or a column number (for ALTER
COLUMN RENAME, say).  It sounds like you would like to have some extra
details returned: for instance the "does the whole of it require a table
rewrite" bit.  It sounds like it can be trivially returned in the JSON
object as a boolean (not sure if it would be attached to each individual
subcommand, or at the global level), even if it doesn't play a role in
the "fmt" string --- similar to how we're labelling different kinds of
constraints in an ADD CONSTRAINT subcommand with the type of constraint
being added.

Not real sure if the functionality you propose would be just returned in
the JSON object (so the user would be in charge of extracting it, or
perhaps we provide additional auxiliary functions that examine the
JSON), or separately.

Elsewhere Pavel was complaining that JSON is not plpgsql-friendly;
something to consider here too.

Anyway if you're able to help the DDL deparse patch by implementing some
of the missing commands (which could help you with your proposal here),
that would be much welcome.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: DDL Damage Assessment

From
Dimitri Fontaine
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>>   - will the table have to be rewritten? the indexes?
>
> Please give my DDL deparsing patch a look.  There is a portion there
> about deparsing ALTER TABLE specifically; what it does is save a list of
> subcommands, and for each of them we either report the OID of the object
> affected (for example in ADD CONSTRAINT), or a column number (for ALTER
> COLUMN RENAME, say).  It sounds like you would like to have some extra
> details returned: for instance the "does the whole of it require a table
> rewrite" bit.  It sounds like it can be trivially returned in the JSON

Some years ago when working on the Event Trigger framework we did
mention providing some interesting events, such as a TableRewrite Event.

In between what you're saying here and what Harold and Peter Geoghegan
are mentionning (basically that dealing with table rewrites is 90% of
the need for them), it could be that the best way to have at it would be
to add that Event in the Event Trigger mechanism.

We could also add an AccessExclusiveLock Event that would fire just
before actually taking the lock, allowing people to RAISE EXCEPTION in
that case, or to maybe just do the LOCK … NOWAIT themselves in the
trigger.

For the locking parts, best would be to do the LOCK … NOWAIT dance for
all the tables touched by the DDL migration script. The Event Trigger
approach will not solve that, unfortunately.

Regards,
--
Dimitri Fontaine                                        06 63 07 10 78
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: DDL Damage Assessment

From
Jan Wieck
Date:
On 10/02/2014 01:15 PM, Joe Conway wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/02/2014 11:30 AM, Dimitri Fontaine wrote:
>> Questions:
>>
>> 1. Do you agree that a systematic way to report what a DDL command
>> (or script, or transaction) is going to do on your production
>> database is a feature we should provide to our growing user base?
>
> +1
>
> I really like the idea and would find it useful/time-saving
>
>> 2. What do you think such a feature should look like?
>
> Elsewhere on this thread EXPLAIN was suggested. That makes a certain
> amount of sense.
>
> Maybe something like EXPLAIN IMPACT [...]
>
>> 3. Does it make sense to support the whole set of DDL commands from
>> the get go (or ever) when most of them are only taking locks in
>> their own pg_catalog entry anyway?
>
> Yes, I think it should cover all commands that can have an
> availability impact.

In principle I agree with the sentiment. However, that full coverage is 
a nice goal, seldom achieved.

The real question is at what level of information, returned to the user, 
does this feature become user friendly?

It is one thing to provide information of the kind of
    TAKE ACCECSS EXCLUSIVE LOCK ON TABLE foo    TEST EVERY ROW IN TABLE foo FOR FK (a, b) IN bar (id1, id2)

That information is useful, but only to an experienced DBA who knows 
their schema and data to a certain degree. The majority of users, I 
fear, will not be able to even remotely guesstimate if that will need 
seconds or hours.

There needs to be more detail information for those cases and I believe 
that tackling them one at a time in depth will lead to more useful 
results than trying to cover a lot but shallow.


My $.02
Jan

-- 
Jan Wieck
Senior Software Engineer
http://slony.info



Re: DDL Damage Assessment

From
Joe Conway
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/02/2014 06:43 PM, Jan Wieck wrote:
> The real question is at what level of information, returned to the
> user, does this feature become user friendly?
> 
> It is one thing to provide information of the kind of
> 
> TAKE ACCECSS EXCLUSIVE LOCK ON TABLE foo TEST EVERY ROW IN TABLE
> foo FOR FK (a, b) IN bar (id1, id2)
> 
> That information is useful, but only to an experienced DBA who
> knows their schema and data to a certain degree. The majority of
> users, I fear, will not be able to even remotely guesstimate if
> that will need seconds or hours.
> 
> There needs to be more detail information for those cases and I
> believe that tackling them one at a time in depth will lead to more
> useful results than trying to cover a lot but shallow.

Perhaps.

This and other posts on this thread make me wonder if some kind of
extension to the stats collector and pg_stat_statement might be a good
place to start?

Joe

- -- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIbBAEBAgAGBQJULeHwAAoJEDfy90M199hlYzQP9jJU6hs21zBZWCUUhN1159U2
Gjs8IfxxTClCUFI9Do+PyoXD2qSJKb1b6y9A6n6YXIJEzZEZfrNuuH3HvcTPMDhs
yhDyAt72dyAU/udDZ2IRORQe61SBycFyi9srTyfRAf/gmUEjSdLXoZZ9JkZHhbjM
65cOmX9icRnwArPwu+FRsSilutfkW7D38s0Fao2uC+zL10acm5xkC+sTRaDJQXZm
MitNsBu2IowYaEGkbNwvzV4lemCySvIyac6YzinBBpOEU32kM7gqjeFx4KLKZAfW
g5/7e0DeuUYw6Y4+ghb+JIdiuPqV8FJdcV+L3z7nFs7QVh4F7IictJTJ/pxdYUzc
VtfGzDAtiXIV7g0TEctoH9T2yiRe2ZEllV0yy7rgXn2Uj4mgMQDgz9QNsBPbcOdz
KZ3Dey2NvoACAXiDwwzE/QKDZu1Siqfe5MnlAFPhWEOm29PiT2GF/Y7Cj6C6D84N
VL+/Y4G9BuPokYQuVMyY7gP5lRPoBqBafiACki/8kIIM5pSAqen19VzSCEZTVRmd
471TqczxG3ibsfyRWEgUxW5zhnR+Se5z6FqIJPvpVbhe3OcpUhk4eQbBrSK3AnyN
Vq0lJ1yWcXzbRxULeFpMmXBbv/ZC/qfLc0tRmMI0VJXYvIXJOp3p2HvKcIK6v/hh
9hil/uPVIHBagyEYZco=
=0lXI
-----END PGP SIGNATURE-----



Re: DDL Damage Assessment

From
Stephen Frost
Date:
* Claudio Freire (klaussfreire@gmail.com) wrote:
> On Thu, Oct 2, 2014 at 6:19 PM, Stephen Frost <sfrost@snowman.net> wrote:
> >> And that needs locks, especially if you modify the underlying filesystem layout.
> >
> > And we wouldn't be doing that, certainly.  It's a dry-run.
>
> ...
>
> > (...) We might also be able to simply get away with
> > short-circuiting them and not actually writing anything (and the same
> > for reading data..).  What would probably be useful is to review actual
> > migration scripts and see if this would really work.  I know they'd work
> > for at least a subset of the migration scripts that I've dealt with
> > before, and also not all of them.
>
> I believe, for it to be reasonably intrusive, you'd have to abort at
> the first need to actually read/write data.
>
> Most of my migration scripts, especially the ones that would benefit
> from this, require some of that, but that's just my personal common
> practice, not a general case.

I realize we're moving into real pie-in-the-sky ideas here, but I wonder
if we could have a way to either create a new 'delta' fork which
maintains the pages which have been changed..  Or perhaps actually have
a new relfileno with a reference back to the original for any empty
pages..

> > The point is to not write anything which is visible to other
> > transactions, which means we'd have to put DML into some different
> > 'mode' which doesn't actually write where other processes might be
> > looking.  I'm not saying it's trivial to do, but I don't think it's
> > impossible either.  (...)
>
> No, I don't think it's impossible either. Just very, very
> time-consuming. Both in developing the patch and in its eventual
> maintenance.

I agree that we don't want a completely independent code path for this
or it won't be as useful- and that's a concern for the EXPLAIN idea too.

> TBH, a separate read-only transaction like explain alter would also be
> quite difficult to keep in sync with actual alter logic, unless it's
> handled by the same code (unlikely in that form).

Right.
Thanks,
    Stephen

Re: DDL Damage Assessment

From
Jim Nasby
Date:
On 10/2/14, 2:43 PM, Josh Berkus wrote:
>> >Questions:
>> >
>> >  1. Do you agree that a systematic way to report what a DDL command (or
>> >     script, or transaction) is going to do on your production database
>> >     is a feature we should provide to our growing user base?
> Yes.
+1
>> >  2. What do you think such a feature should look like?
> As with others, I think EXPLAIN is a good way to do this without adding
> a keyword.  So you'd do:
>
> EXPLAIN
> ALTER TABLE ....
I'm thinking it would be better to have something you could set at a session level, so you don't have to stick EXPLAIN
infront of all your DDL.
 

As for the dry-run idea, I don't think that's really necessary. I've never seen anyone serious that doesn't have a
developmentenvironment, which is where you would simply deploy the real DDL using "verbose" mode and see what the
underlyingcommands actually do.
 



Re: DDL Damage Assessment

From
Claudio Freire
Date:
On Fri, Oct 3, 2014 at 12:07 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 10/2/14, 2:43 PM, Josh Berkus wrote:
>>>
>>> >Questions:
>>> >
>>> >  1. Do you agree that a systematic way to report what a DDL command (or
>>> >     script, or transaction) is going to do on your production database
>>> >     is a feature we should provide to our growing user base?
>>
>> Yes.
>
> +1
>>>
>>> >  2. What do you think such a feature should look like?
>>
>> As with others, I think EXPLAIN is a good way to do this without adding
>> a keyword.  So you'd do:
>>
>> EXPLAIN
>> ALTER TABLE ....
>
> I'm thinking it would be better to have something you could set at a session
> level, so you don't have to stick EXPLAIN in front of all your DDL.
>
> As for the dry-run idea, I don't think that's really necessary. I've never
> seen anyone serious that doesn't have a development environment, which is
> where you would simply deploy the real DDL using "verbose" mode and see what
> the underlying commands actually do.


Well, the thing that's difficult to reproduce on a development
environment, is the locking issues.

It may run perfect on an idle system, only to lock up forever (or
unacceptably long) on a busy one.

That is, probably, one concern that can be attacked relatively cheaply
with some clever locking / cancelling logic.



Re: DDL Damage Assessment

From
Dimitri Fontaine
Date:
Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
>> EXPLAIN
>> ALTER TABLE ....
> I'm thinking it would be better to have something you could set at a session
> level, so you don't have to stick EXPLAIN in front of all your DDL.

Yeah I'm coming into that camp too, and I think the Event Trigger idea
gets us halfway there. Here's a detailed sketched of how it would work:
1. preparatory steps: install the Event Trigger   create extension norewrite;
2. test run:
   psql -1 -f ddl.sql   ERROR: Table Rewrite has been cancelled.
3. Well actually we need to run that thing in production
   BEGIN;     ALTER EVENT TRIGGER norewrite DISABLE;     \i ddl.sql     ALTER EVENT TRIGGER norewrite ENABLE;
COMMIT;

Then it's also possible to have another Event Trigger that would
automatically issue a LOCK <table> NOWAIT; command before any DDL
against a table is run, in another extension:
 create extension ddl_lock_nowait;

The same applies, if your production rollout is blocked repeatedly and
you want to force it through at some point, it's possible to disable the
event trigger within the DDL script/transaction.

> As for the dry-run idea, I don't think that's really necessary. I've never
> seen anyone serious that doesn't have a development environment, which is
> where you would simply deploy the real DDL using "verbose" mode and see what
> the underlying commands actually do.

The major drawback of the Event Trigger idea is that the transaction is
cancelled as soon as a Rewrite Event is fired when you have installed
the protective trigger. It means that you won't see the next problem
after the first one, so it's not a dry-run.

But considering what you're saying here, it might well be enough.

Regards,
-- 
Dimitri Fontaine                                        06 63 07 10 78
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support



Re: DDL Damage Assessment

From
Robert Haas
Date:
On Thu, Oct 2, 2014 at 5:22 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
>> On 2014-10-02 17:03:59 -0400, Stephen Frost wrote:
>> > > That sounds extremely complex. You'd have to implement the fake
>> > > columns, foreign keys, indexes, etc on most execution nodes, the
>> > > planner, and even system views.
>> >
>> > Eh?  We have MVCC catalog access.
>>
>> So you want to modify the catalog without actually doing the
>> corresponding actions? That'll be heck of invasive. With changes all
>> over the backend. We'll need to remove error checks (like for the
>> existance of relfilenodes), remove rewriting, and such.
>
> Yeah, I was getting at it being rather invasive earlier.  It really
> depends on exactly what we'd support in this mode, which would depend on
> just what would be invasive and what wouldn't, I expect.  I dislike the
> idea of not being able to actually run a real migration script though as
> anything else opens the very real possibility that the real script and
> the 'explain' script don't do the same thing, making this capability not
> nearly as useful..

I think this is the real issue.  Somebody's got to come up with an
infrastructure for reporting locks to be taken without actually
performing the corresponding actions, and it's got to go through
basically the same code paths as it would if we really did it.  That's
going to be complicated.  But if we make the EXPLAIN code completely
separate from the real code, it will be easy for them to get out of
sync with each other, and then the answers will be wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: DDL Damage Assessment

From
Stephen Frost
Date:
* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote:
> I'm thinking it would be better to have something you could set at a session level, so you don't have to stick
EXPLAINin front of all your DDL.
 

Right, I'm agreed there.

> As for the dry-run idea, I don't think that's really necessary. I've never seen anyone serious that doesn't have a
developmentenvironment, which is where you would simply deploy the real DDL using "verbose" mode and see what the
underlyingcommands actually do.
 

That's certainly an interesting point and perhaps what we'd do is,
instead, have a "collect info on locks needed" mode- but otherwise, let
everything run as-is.  You could then take the report at the end of the
transaction and use it to identify what would be needed in production
and maybe even have a script created which grabs all the locks using
'nowait' or fails the whole thing if it isn't possible..

Of course, we kind of have that already...  Just look at the locks
you've acquired at the end of the transaction..
Thanks,
    Stephen

Re: DDL Damage Assessment

From
José Luis Tallón
Date:
On 10/03/2014 11:02 AM, Dimitri Fontaine wrote:
> Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
>>> EXPLAIN
>>> ALTER TABLE ....
>> I'm thinking it would be better to have something you could set at a session
>> level, so you don't have to stick EXPLAIN in front of all your DDL.

We were considering the potential needs of "accidental DBAs" here at 
first, if memory serves me well.
I definitively see the value of EXPLAIN [DDL STATEMENT]... even if 
implemented as a regular "this is what should happen", without even 
attempting to run a thing (not even "dry run transactions"), but 
including the full catalog search / attribute resolution etc.
Providing insight on the locking that would happen sounds like a real 
life-saver for many real life situations (i.e. would this ALTER 
completely lock my application due to the constant logging-to-table?)
This is, obviously IMHO.
> Yeah I'm coming into that camp too, and I think the Event Trigger idea
> gets us halfway there. Here's a detailed sketched of how it would work:
>
>   1. preparatory steps: install the Event Trigger
>   
>      create extension norewrite;
>
>   2. test run:
>
>      psql -1 -f ddl.sql
>      ERROR: Table Rewrite has been cancelled.
>
>   3. Well actually we need to run that thing in production
>
>      BEGIN;
>        ALTER EVENT TRIGGER norewrite DISABLE;
>        \i ddl.sql
>        ALTER EVENT TRIGGER norewrite ENABLE;
>      COMMIT;
>
> Then it's also possible to have another Event Trigger that would
> automatically issue a LOCK <table> NOWAIT; command before any DDL
> against a table is run, in another extension:
>
>    create extension ddl_lock_nowait;
>
> The same applies, if your production rollout is blocked repeatedly and
> you want to force it through at some point, it's possible to disable the
> event trigger within the DDL script/transaction.

This serves a different purpose which is, at least, as worthwhile as the 
former: provide a real "dry run" mechanism for advanced users.
Stephen's "delta" fork sounds like a promising approach ... even if a 
bit too "Oracle-ish" (sounds an awful lot like UNDO logs!) for my liking.

>> As for the dry-run idea, I don't think that's really necessary. I've never
>> seen anyone serious that doesn't have a development environment, which is
>> where you would simply deploy the real DDL using "verbose" mode and see what
>> the underlying commands actually do.
> The major drawback of the Event Trigger idea is that the transaction is
> cancelled as soon as a Rewrite Event is fired when you have installed
> the protective trigger. It means that you won't see the next problem
> after the first one, so it's not a dry-run.
>
> But considering what you're saying here, it might well be enough.

It is a very convenient first step (minimally invasive, and good use of 
existing infrastructure)... since it allows an easy testing phase in 
order to iron out potential shortcomings and gather input on some other 
applications.


Thanks,
    / Jose