Thread: DDL Damage Assessment
Hi fellow hackers, I would like to work on a new feature allowing our users to assess the amount of trouble they will run into when running a DDL script on their production setups, *before* actually getting their services down. The main practical example I can offer here is the ALTER TABLE command. Recent releases are including very nice optimisations to it, so much so that it's becoming increasingly hard to answer some very basic questions: - what kind of locks will be taken? (exclusive, shared) - on what objects? (foreign keys, indexes, sequences, etc) - willthe table have to be rewritten? the indexes? Of course the docs are answering parts of those, but in particular the table rewriting rules are complex enough that “accidental DBAs” will fail to predict if the target data type is binary coercible to the current one. Questions: 1. Do you agree that a systematic way to report what a DDL command (or script, or transaction) is going to do on your productiondatabase is a feature we should provide to our growing user base? 2. What do you think such a feature should look like? 3. Does it make sense to support the whole set of DDL commands from the get go (or ever) when most of them are only takinglocks in their own pg_catalog entry anyway? Provided that we are able to converge towards a common enough answer to those questions, I propose to hack my way around and send patches to have it (the common answer) available in the next PostgreSQL release. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote:
>
> Hi fellow hackers,
>
> I would like to work on a new feature allowing our users to assess the
> amount of trouble they will run into when running a DDL script on their
> production setups, *before* actually getting their services down.
>
> The main practical example I can offer here is the ALTER TABLE command.
> Recent releases are including very nice optimisations to it, so much so
> that it's becoming increasingly hard to answer some very basic
> questions:
>
> - what kind of locks will be taken? (exclusive, shared)
> - on what objects? (foreign keys, indexes, sequences, etc)
> - will the table have to be rewritten? the indexes?
>
> Of course the docs are answering parts of those, but in particular the
> table rewriting rules are complex enough that “accidental DBAs” will
> fail to predict if the target data type is binary coercible to the
> current one.
>
> Questions:
>
> 1. Do you agree that a systematic way to report what a DDL command (or
> script, or transaction) is going to do on your production database
> is a feature we should provide to our growing user base?
>
> 2. What do you think such a feature should look like?
>
> 3. Does it make sense to support the whole set of DDL commands from the
> get go (or ever) when most of them are only taking locks in their
> own pg_catalog entry anyway?
>
> Provided that we are able to converge towards a common enough answer to
> those questions, I propose to hack my way around and send patches to
> have it (the common answer) available in the next PostgreSQL release.
>
Regards,
--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello
On Thu, Oct 2, 2014 at 1:46 PM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote: > On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> > wrote: >> >> Hi fellow hackers, >> >> I would like to work on a new feature allowing our users to assess the >> amount of trouble they will run into when running a DDL script on their >> production setups, *before* actually getting their services down. >> >> The main practical example I can offer here is the ALTER TABLE command. >> Recent releases are including very nice optimisations to it, so much so >> that it's becoming increasingly hard to answer some very basic >> questions: >> >> - what kind of locks will be taken? (exclusive, shared) >> - on what objects? (foreign keys, indexes, sequences, etc) >> - will the table have to be rewritten? the indexes? >> >> Of course the docs are answering parts of those, but in particular the >> table rewriting rules are complex enough that “accidental DBAs” will >> fail to predict if the target data type is binary coercible to the >> current one. >> >> Questions: >> >> 1. Do you agree that a systematic way to report what a DDL command (or >> script, or transaction) is going to do on your production database >> is a feature we should provide to our growing user base? >> >> 2. What do you think such a feature should look like? >> >> 3. Does it make sense to support the whole set of DDL commands from the >> get go (or ever) when most of them are only taking locks in their >> own pg_catalog entry anyway? >> >> Provided that we are able to converge towards a common enough answer to >> those questions, I propose to hack my way around and send patches to >> have it (the common answer) available in the next PostgreSQL release. >> > > What you are proposing is some kind of "dry-run" with verbose output? EXPLAIN ALTER TABLE ?
I think the main issue is when a table rewrite is triggered on a DDL command on a large table, as this is what frequently leads to unavailability. The idea of introducing a NOREWRITE keyword to DDL commands then came up (credit: Peter Geoghegan). When the NOREWRITE keyword is used and the DDL statement would rewrite the table, the command errors and exits. This would allow ORM and framework authors to include the NOREWRITE option by default, only to be disabled on a per-statement basis by the developer, once they have assessed that it may be safe or otherwise they still want to proceed with this. The workflow for an app developer then becomes: * Write offending data migration (eg: add a column with a NOT NULL constraint and default value) * Test it locally, either by running automated test suite or running on staging * See that it fails because of NOREWRITE option * Assess situation. If it's a small table, or I still want to ignore, override the option. Or rewrite migration to avoid rewrite. * Repeat I like this a lot just because it's simple, limited in scope, and can be easily integrated into ORMs saving users hours of downtime and frustration. Thoughts? On Thu, Oct 2, 2014 at 9:46 AM, Fabrízio de Royes Mello <fabriziomello@gmail.com> wrote: > > > On Thu, Oct 2, 2014 at 1:30 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> > wrote: >> >> Hi fellow hackers, >> >> I would like to work on a new feature allowing our users to assess the >> amount of trouble they will run into when running a DDL script on their >> production setups, *before* actually getting their services down. >> >> The main practical example I can offer here is the ALTER TABLE command. >> Recent releases are including very nice optimisations to it, so much so >> that it's becoming increasingly hard to answer some very basic >> questions: >> >> - what kind of locks will be taken? (exclusive, shared) >> - on what objects? (foreign keys, indexes, sequences, etc) >> - will the table have to be rewritten? the indexes? >> >> Of course the docs are answering parts of those, but in particular the >> table rewriting rules are complex enough that “accidental DBAs” will >> fail to predict if the target data type is binary coercible to the >> current one. >> >> Questions: >> >> 1. Do you agree that a systematic way to report what a DDL command (or >> script, or transaction) is going to do on your production database >> is a feature we should provide to our growing user base? >> >> 2. What do you think such a feature should look like? >> >> 3. Does it make sense to support the whole set of DDL commands from the >> get go (or ever) when most of them are only taking locks in their >> own pg_catalog entry anyway? >> >> Provided that we are able to converge towards a common enough answer to >> those questions, I propose to hack my way around and send patches to >> have it (the common answer) available in the next PostgreSQL release. >> > > What you are proposing is some kind of "dry-run" with verbose output? > > Regards, > > -- > Fabrízio de Royes Mello > Consultoria/Coaching PostgreSQL >>> Timbira: http://www.timbira.com.br >>> Blog: http://fabriziomello.github.io >>> Linkedin: http://br.linkedin.com/in/fabriziomello >>> Twitter: http://twitter.com/fabriziomello >>> Github: http://github.com/fabriziomello
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/02/2014 11:30 AM, Dimitri Fontaine wrote: > Questions: > > 1. Do you agree that a systematic way to report what a DDL command > (or script, or transaction) is going to do on your production > database is a feature we should provide to our growing user base? +1 I really like the idea and would find it useful/time-saving > 2. What do you think such a feature should look like? Elsewhere on this thread EXPLAIN was suggested. That makes a certain amount of sense. Maybe something like EXPLAIN IMPACT [...] > 3. Does it make sense to support the whole set of DDL commands from > the get go (or ever) when most of them are only taking locks in > their own pg_catalog entry anyway? Yes, I think it should cover all commands that can have an availability impact. Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBAgAGBQJULYhAAAoJEDfy90M199hllYgP/0Du599FAMtGh+Z9PsT+XRp9 eodurnf3TjbN8euh+/KGUDDy9dh8xiyeVCbLwT1a7tbJpY5ziGKQFrFm/5yXteq1 vU58mrvx3RwsuWJiTxVKUUddJgBd/e1Q1n7CS/rDHMWyHHxW9PfVi4c/V/09NB/p IZQP2lTiEJMZVRgemR53OokQarmrm08fN5HtaAbdwwA0y3q26lPWyx7y0DBiy1w2 2KMNQVxIHDYPby+HlDiJEwq8YxNEOuUcznfr2rICxX5iJxsoA13A04GwqDnzcPdL W3eg+P4qV7TriytpGD1GgqkyAzqTuQNaOBcGY7pvWBhBjQiDPA0fGuNw/a7MeOco 9JTJeCjOygoSopnMFMXyF7epjZxReZtr88uC8nZDXC8wwkJIVDzhNQefhT1lTA+a 1MTcBwgFBq1lH5ttdOTKjbqD7+uPp7nxaMhD9GNgCLu/NZeMNo1O4HMjv9Ir6AyQ etbkxcdOFuDaHmnrXnGOAFiM01JmorpVu6LBw4OjiD9KaO9X0gudHPo4LzocCxdB 6V2eTl95z/fKlG7uQOrNJ/S9y43FhFtgMZVsi0qIRqzu34ge7nxowjwyF9wcMZSq CKCEk4NlzULGsivPF96eMxxtebFgvYp10AvRvckGuf9s3dZBmqHfI6PPT1J3qPyj goq9yD/KpDfHLziqmZpr =6cWT -----END PGP SIGNATURE-----
On 10/02/2014 09:30 AM, Dimitri Fontaine wrote: > Questions: > > 1. Do you agree that a systematic way to report what a DDL command (or > script, or transaction) is going to do on your production database > is a feature we should provide to our growing user base? I would say it is late to the game and a great feature. > > 2. What do you think such a feature should look like? > I liked the other post that said: EXPLAIN <ALTER TABLE> or whatever. Heck it could even be useful to have EXPLAIN ANALZYE <ALTER TABLE> in case people want to run it on staging/test/dev environments to judge impact. > 3. Does it make sense to support the whole set of DDL commands from the > get go (or ever) when most of them are only taking locks in their > own pg_catalog entry anyway? I would think that introducing this incrementally makes sense. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, @cmdpromptinc "If we send our children to Caesar for their education, we should not be surprised when they come back as Romans."
* Dimitri Fontaine (dimitri@2ndQuadrant.fr) wrote: > 1. Do you agree that a systematic way to report what a DDL command (or > script, or transaction) is going to do on your production database > is a feature we should provide to our growing user base? I definitely like the idea of such a 'dry-run' kind of operation to get an idea of what would happen. > 2. What do you think such a feature should look like? My thinking is that this would be implemented as a new kind of read-only transaction type. > 3. Does it make sense to support the whole set of DDL commands from the > get go (or ever) when most of them are only taking locks in their > own pg_catalog entry anyway? On the fence about this one.. In general, I'd say "yes", but I've not looked at every case and I imagine there are DDL commands which really aren't all that interesting for this case. > Provided that we are able to converge towards a common enough answer to > those questions, I propose to hack my way around and send patches to > have it (the common answer) available in the next PostgreSQL release. That feels a bit ambitious, given that we've not yet really nailed down the feature definition yet, but I do like where you're going. :) Thanks! Stephen
* Harold Giménez (harold@heroku.com) wrote: > I think the main issue is when a table rewrite is triggered on a DDL > command on a large table, as this is what frequently leads to > unavailability. The idea of introducing a NOREWRITE keyword to DDL > commands then came up (credit: Peter Geoghegan). When the NOREWRITE > keyword is used and the DDL statement would rewrite the table, the > command errors and exits. > > This would allow ORM and framework authors to include the NOREWRITE > option by default, only to be disabled on a per-statement basis by the > developer, once they have assessed that it may be safe or otherwise > they still want to proceed with this. The workflow for an app > developer then becomes: > > * Write offending data migration (eg: add a column with a NOT NULL > constraint and default value) > * Test it locally, either by running automated test suite or running on staging > * See that it fails because of NOREWRITE option > * Assess situation. If it's a small table, or I still want to ignore, > override the option. Or rewrite migration to avoid rewrite. > * Repeat > > I like this a lot just because it's simple, limited in scope, and can > be easily integrated into ORMs saving users hours of downtime and > frustration. > > Thoughts? Not against it, but feels like an independent thing to consider- what Devrim is suggesting is broader and encompasses the issue of locks, which are certainly important to consider also. In short, seems like having both would be worthwhile. Thanks, Stephen
* Joshua D. Drake (jd@commandprompt.com) wrote: > > 2. What do you think such a feature should look like? > > I liked the other post that said: EXPLAIN <ALTER TABLE> or whatever. > Heck it could even be useful to have EXPLAIN ANALZYE <ALTER TABLE> > in case people want to run it on staging/test/dev environments to > judge impact. The downside of the 'explain' approach is that the script then has to be modified to put 'explain' in front of everything and then you have to go through each statement and consider it. Having a 'dry-run' transaction type which then produces a report at the end feels like it'd be both easier to assess the overall implications, and less error-prone as you don't have to prefex every statement with 'explain'. It might even be possible to have the local "view" of post-alter statements be available inside of this 'dry-run' option- that is, if you add a column in the transaction then the column exists to the following commands, so it doesn't just error out. Having 'explain <whatever>' wouldn't give you that and so you really wouldn't be able to have whole scripts run by just pre-pending each command with 'explain'. > > 3. Does it make sense to support the whole set of DDL commands from the > > get go (or ever) when most of them are only taking locks in their > > own pg_catalog entry anyway? > > I would think that introducing this incrementally makes sense. Agreed. Thanks, Stephen
> Questions: > > 1. Do you agree that a systematic way to report what a DDL command (or > script, or transaction) is going to do on your production database > is a feature we should provide to our growing user base? Yes. > 2. What do you think such a feature should look like? As with others, I think EXPLAIN is a good way to do this without adding a keyword. So you'd do: EXPLAIN ALTER TABLE .... ... and it would produce a bunch of actions, available in either text or JSON formats. For example: { locks : [ { lock_type: relation, relation: table1, lock type: ACCESS EXCLUSIVE },{ lock_type: transaction },{ lock_type:catalog, catalogs: [pg_class, pg_attribute, pg_statistic], lock_type: EXCLUSIVE } ] } { writes : [{ object: relation files, action: rewrite },{ object: catalogs action: update }] ... etc. Would need a lot of refinement, but you get the idea. > 3. Does it make sense to support the whole set of DDL commands from the > get go (or ever) when most of them are only taking locks in their > own pg_catalog entry anyway? Well, eventually we'd want to support all of them just to avoid having things be wierd for users. However, here's a priority order: ALTER TABLE CREATE TABLE DROP TABLE ALTER VIEW CREATE VIEW CREATE INDEX DROP INDEX ... since all of the above can have unexpected secondary effects on locking. For example, if you create a table with FKs it will take an ACCESS EXCLUSIVE lock on the FK targets. And if you DROP a partition, it takes an A.E. lock on the parent table. > Provided that we are able to converge towards a common enough answer to > those questions, I propose to hack my way around and send patches to > have it (the common answer) available in the next PostgreSQL release. Great! -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 10/02/2014 06:30 PM, Dimitri Fontaine wrote: > Hi fellow hackers, > [snip] > Questions: > > 1. Do you agree that a systematic way to report what a DDL command (or > script, or transaction) is going to do on your production database > is a feature we should provide to our growing user base? Yes, please > 2. What do you think such a feature should look like? EXPLAIN [(verbose, format)] [DDL_COMMAND] as in: EXPLAIN (verbose on, format text, impact on) ALTER TABLE emp ADD COLUMN foo2 jsonb NOT NULL DEFAULT '{}'; where the output would include something like: ... EXCLUSIVE LOCK ON TABLE emp; // due to "IMPACT ON" REWRITE TABLE emp due to adding column foo2 (default='{}'::jsonb) // due to "VERBOSE on" ... > 3. Does it make sense to support the whole set of DDL commands from the > get go (or ever) when most of them are only taking locks in their > own pg_catalog entry anyway? For completeness sake, yes. But, unless the "impact" and "verbose" modifiers are specified, most would be quite self-explanatory: EXPLAIN (verbose on, impact on) TRUNCATE TABLE emp; Execution plan: -> EXCLUSIVE LOCK ON TABLE emp; .... -> truncateindex: IIIIII (file=NNNNN) // NNNN = relfilenode -> truncate main fork: NNNNN (tablespace: TTTTT) // NNNN = relfilenode -> truncate visibility map .... -> RELEASE LOCK ON TABLE emp; .... Summary: ZZZZZ pages (MMM MB ) would be freed versus a simple: EXPLAIN TRUNCATE TABLE emp; Execution plan: -> truncate index: emp_pkey -> truncate index: emp_foo2_idx -> truncaterelation emp > Provided that we are able to converge towards a common enough answer to > those questions, I propose to hack my way around and send patches to > have it (the common answer) available in the next PostgreSQL release. > Sounds very good, indeed. Count on me as tester :) -- José Luis Tallón
> EXPLAIN ALTER TABLE ? Good thing: People recognize it. Bad thing: People might not be able to tell the difference between a DDL and DML result. What about "EXPLAIN DDL ..."? The extra keyword ("DDL") makes it a bit more explicit that the results are not comparable to the standard explain output. -- Steven Lembark 3646 Flora Pl Workhorse Computing St Louis, MO 63110 lembark@wrkhors.com +1 888 359 3508
On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Joshua D. Drake (jd@commandprompt.com) wrote: >> > 2. What do you think such a feature should look like? >> >> I liked the other post that said: EXPLAIN <ALTER TABLE> or whatever. >> Heck it could even be useful to have EXPLAIN ANALZYE <ALTER TABLE> >> in case people want to run it on staging/test/dev environments to >> judge impact. > > The downside of the 'explain' approach is that the script then has to be > modified to put 'explain' in front of everything and then you have to go > through each statement and consider it. Having a 'dry-run' transaction > type which then produces a report at the end feels like it'd be both > easier to assess the overall implications, and less error-prone as you > don't have to prefex every statement with 'explain'. It might even be > possible to have the local "view" of post-alter statements be available > inside of this 'dry-run' option- that is, if you add a column in the > transaction then the column exists to the following commands, so it > doesn't just error out. Having 'explain <whatever>' wouldn't give you > that and so you really wouldn't be able to have whole scripts run by > just pre-pending each command with 'explain'. That sounds extremely complex. You'd have to implement the fake columns, foreign keys, indexes, etc on most execution nodes, the planner, and even system views. IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs locks. I don't think you can simulate the side effects without locks, so getting the local view of changes will be extremely difficult unless you limit the scope considerably.
On Thu, Oct 2, 2014 at 12:40 PM, Stephen Frost <sfrost@snowman.net> wrote: > The downside of the 'explain' approach is that the script then has to be > modified to put 'explain' in front of everything and then you have to go > through each statement and consider it. Having a 'dry-run' transaction > type which then produces a report at the end feels like it'd be both > easier to assess the overall implications, and less error-prone as you > don't have to prefex every statement with 'explain'. It might even be > possible to have the local "view" of post-alter statements be available > inside of this 'dry-run' option- that is, if you add a column in the > transaction then the column exists to the following commands, so it > doesn't just error out. Having 'explain <whatever>' wouldn't give you > that and so you really wouldn't be able to have whole scripts run by > just pre-pending each command with 'explain'. It's kind of tricky to implement a patch to figure this out ahead of time. Some of the actual lock acquisitions are well hidden, in terms of how the code is structured. In others cases, it may not even be possible to determine ahead of time exactly what locks will be taken. As Harold mentioned, another idea along the same lines would be to decorate DDL with a NOWAIT "no locking assertion" and/or "no rewrite assertion". Basically, if this DDL (or perhaps any DDL, if this is implemented as a GUC instead) necessitates a table rewrite (and requires an AccessExclusiveLock), throw an error. That's the case that most people care about. This may not even be good enough, though. Consider: Session 1 is a long running transaction. Maybe it's a spurious idle-in-transaction situation, but it could also be totally reasonable. It holds an AccessShareLock on some relation, as long running transactions are inclined to do. Session 2 is our migration. It needs an AccessExclusiveLock to ALTER TABLE on the same relation (or whatever). But it doesn't need a rewrite, which is good. It comes along and attempts to acquire the lock, blocking on session 1. Session 3 is an innocent bystander. It goes to query the same table in an ordinary, routine way - a SELECT statement. Even though session 2's lock is not granted yet, session 3 is not at liberty to skip the queue and get its own AccessShareLock. The effect is about the same as if session 2 did need to hold an AccessExclusiveLock for ages: read queries block for a long time. And yet, in theory session 2's impact on production should not be minimal, if we consider something like EXPLAIN output. Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a particularly bad case. NOWAIT might be the wrong thing for DDL generally. -- Peter Geoghegan
On Thu, Oct 2, 2014 at 1:37 PM, Peter Geoghegan <pg@heroku.com> wrote: > And yet, in theory session 2's impact > on production should not be minimal, if we consider something like > EXPLAIN output. "Should have been minimal", I mean. -- Peter Geoghegan
On Thu, Oct 2, 2014 at 5:37 PM, Peter Geoghegan <pg@heroku.com> wrote: > Session 3 is an innocent bystander. It goes to query the same table in > an ordinary, routine way - a SELECT statement. Even though session 2's > lock is not granted yet, session 3 is not at liberty to skip the queue > and get its own AccessShareLock. The effect is about the same as if > session 2 did need to hold an AccessExclusiveLock for ages: read > queries block for a long time. And yet, in theory session 2's impact > on production should not be minimal, if we consider something like > EXPLAIN output. The explain would show the AccessExclusiveLock, so it would be enough for a heads-up to kill all idle-in-transaction holding locks on the target relation (if killable, or just wait). Granted, it's something that's not easily automatable, whereas a nowait is. However, rather than nowait, I'd prefer "cancellable" semantics, that would cancel voluntarily if any other transaction requests a conflicting lock, like autovacuum does.
* Claudio Freire (klaussfreire@gmail.com) wrote: > On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote: > > The downside of the 'explain' approach is that the script then has to be > > modified to put 'explain' in front of everything and then you have to go > > through each statement and consider it. Having a 'dry-run' transaction > > type which then produces a report at the end feels like it'd be both > > easier to assess the overall implications, and less error-prone as you > > don't have to prefex every statement with 'explain'. It might even be > > possible to have the local "view" of post-alter statements be available > > inside of this 'dry-run' option- that is, if you add a column in the > > transaction then the column exists to the following commands, so it > > doesn't just error out. Having 'explain <whatever>' wouldn't give you > > that and so you really wouldn't be able to have whole scripts run by > > just pre-pending each command with 'explain'. > > That sounds extremely complex. You'd have to implement the fake > columns, foreign keys, indexes, etc on most execution nodes, the > planner, and even system views. Eh? We have MVCC catalog access. > IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs > locks. I don't think you can simulate the side effects without locks, Why? If you know the transaction is going to roll back and you only add entries to the catalog which aren't visible to any other transactions than your own, and you make sure that nothing you do actually writes data out which is visible to other transactions.. > so getting the local view of changes will be extremely difficult > unless you limit the scope considerably. I agree that there may be complexities, but I'm not sure this is really the issue.. Thanks, Stephen
On Thu, Oct 2, 2014 at 6:00 PM, Peter Geoghegan <pg@heroku.com> wrote: >> Granted, it's something that's not easily automatable, whereas a nowait is. >> >> However, rather than nowait, I'd prefer "cancellable" semantics, that >> would cancel voluntarily if any other transaction requests a >> conflicting lock, like autovacuum does. > > I think the problem you'll have with NOWAIT is: you have an error from > having to wait...what now? Do you restart? I imagine this would > frequently result in what is effectively lock starvation. Any old > AccessShareLock-er is going to make our migration tool restart. We'll > never finish. I've done that manually (throw the DDL, and cancel if it takes more than a couple of seconds) on modest but relatively busy servers with quite some success.
On Thu, Oct 2, 2014 at 1:52 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > The explain would show the AccessExclusiveLock, so it would be enough > for a heads-up to kill all idle-in-transaction holding locks on the > target relation (if killable, or just wait). I think that there are very few problems with recognizing when an AccessExclusiveLock is needed or not needed. The exceptions to the rule that DDL needs such a lock are narrow enough that I have a hard time believing that most people think about it, or even need to think about it. I wish that wasn't the case, but it is. > Granted, it's something that's not easily automatable, whereas a nowait is. > > However, rather than nowait, I'd prefer "cancellable" semantics, that > would cancel voluntarily if any other transaction requests a > conflicting lock, like autovacuum does. I think the problem you'll have with NOWAIT is: you have an error from having to wait...what now? Do you restart? I imagine this would frequently result in what is effectively lock starvation. Any old AccessShareLock-er is going to make our migration tool restart. We'll never finish. -- Peter Geoghegan
On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Claudio Freire (klaussfreire@gmail.com) wrote: >> On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote: >> > The downside of the 'explain' approach is that the script then has to be >> > modified to put 'explain' in front of everything and then you have to go >> > through each statement and consider it. Having a 'dry-run' transaction >> > type which then produces a report at the end feels like it'd be both >> > easier to assess the overall implications, and less error-prone as you >> > don't have to prefex every statement with 'explain'. It might even be >> > possible to have the local "view" of post-alter statements be available >> > inside of this 'dry-run' option- that is, if you add a column in the >> > transaction then the column exists to the following commands, so it >> > doesn't just error out. Having 'explain <whatever>' wouldn't give you >> > that and so you really wouldn't be able to have whole scripts run by >> > just pre-pending each command with 'explain'. >> >> That sounds extremely complex. You'd have to implement the fake >> columns, foreign keys, indexes, etc on most execution nodes, the >> planner, and even system views. > > Eh? We have MVCC catalog access. And that needs locks, especially if you modify the underlying filesystem layout. >> IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs >> locks. I don't think you can simulate the side effects without locks, > > Why? If you know the transaction is going to roll back and you only add > entries to the catalog which aren't visible to any other transactions > than your own, and you make sure that nothing you do actually writes > data out which is visible to other transactions.. But that's not the scope. If you want a dry-run of table-rewriting DDL, or DDL interspersed with DML like: alter table blargh add foo integer; update blargh set foo = coalesce(bar, baz); You really cannot hope not to have to write data. The above is also the case with defaulted columns btw. >> so getting the local view of changes will be extremely difficult >> unless you limit the scope considerably. > > I agree that there may be complexities, but I'm not sure this is really > the issue.. In essence, if you want MVCC catalog access without AEL, you're in for a rough ride. I'm not as experienced with pg's core as you, so you tell me, but I imagine it will be the case.
On Thu, Oct 2, 2014 at 2:04 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > I've done that manually (throw the DDL, and cancel if it takes more > than a couple of seconds) on modest but relatively busy servers with > quite some success. Fair enough, but that isn't the same as NOWAIT. It's something we'd have a hard time coming up with a general-purpose timeout for. -- Peter Geoghegan
* Peter Geoghegan (pg@heroku.com) wrote: > On Thu, Oct 2, 2014 at 12:40 PM, Stephen Frost <sfrost@snowman.net> wrote: > > The downside of the 'explain' approach is that the script then has to be > > modified to put 'explain' in front of everything and then you have to go > > through each statement and consider it. Having a 'dry-run' transaction > > type which then produces a report at the end feels like it'd be both > > easier to assess the overall implications, and less error-prone as you > > don't have to prefex every statement with 'explain'. It might even be > > possible to have the local "view" of post-alter statements be available > > inside of this 'dry-run' option- that is, if you add a column in the > > transaction then the column exists to the following commands, so it > > doesn't just error out. Having 'explain <whatever>' wouldn't give you > > that and so you really wouldn't be able to have whole scripts run by > > just pre-pending each command with 'explain'. > > It's kind of tricky to implement a patch to figure this out ahead of > time. Some of the actual lock acquisitions are well hidden, in terms > of how the code is structured. In others cases, it may not even be > possible to determine ahead of time exactly what locks will be taken. I was thinking this would be a new kind of transaction and we'd have to teach parts of the system about it- yes, that's pretty invasive, but it's at least one approach to consider. > As Harold mentioned, another idea along the same lines would be to > decorate DDL with a NOWAIT "no locking assertion" and/or "no rewrite > assertion". Basically, if this DDL (or perhaps any DDL, if this is > implemented as a GUC instead) necessitates a table rewrite (and > requires an AccessExclusiveLock), throw an error. That's the case that > most people care about. The problem I see with this approach is outlined above. I agree that it may be independently valuable, but I don't see it as being a solution to the issue. > This may not even be good enough, though. Consider: > > Session 1 is a long running transaction. Maybe it's a spurious > idle-in-transaction situation, but it could also be totally > reasonable. It holds an AccessShareLock on some relation, as long > running transactions are inclined to do. > > Session 2 is our migration. It needs an AccessExclusiveLock to ALTER > TABLE on the same relation (or whatever). But it doesn't need a > rewrite, which is good. It comes along and attempts to acquire the > lock, blocking on session 1. > > Session 3 is an innocent bystander. It goes to query the same table in > an ordinary, routine way - a SELECT statement. Even though session 2's > lock is not granted yet, session 3 is not at liberty to skip the queue > and get its own AccessShareLock. The effect is about the same as if > session 2 did need to hold an AccessExclusiveLock for ages: read > queries block for a long time. And yet, in theory session 2's impact > on production should not be minimal, if we consider something like > EXPLAIN output. > > Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a > particularly bad case. NOWAIT might be the wrong thing for DDL > generally. I agree that this is a concern, but this feels to me like a next-step over top of the "assess the locks required" transaction type which I am trying to outline. Specifically, having a way to take the report of what locks are going to be required and then actaully attempt to acquire them all (or fail if any can't be granted immediately) would be a natural next step and a way to start off the actual migration script- either all get acquired and the script runs to completion, or a lock isn't granted and the whole thing fails immediately without anything actually being done. Thanks, Stephen
On 2014-10-02 17:03:59 -0400, Stephen Frost wrote: > * Claudio Freire (klaussfreire@gmail.com) wrote: > > On Thu, Oct 2, 2014 at 4:40 PM, Stephen Frost <sfrost@snowman.net> wrote: > > > The downside of the 'explain' approach is that the script then has to be > > > modified to put 'explain' in front of everything and then you have to go > > > through each statement and consider it. Having a 'dry-run' transaction > > > type which then produces a report at the end feels like it'd be both > > > easier to assess the overall implications, and less error-prone as you > > > don't have to prefex every statement with 'explain'. It might even be > > > possible to have the local "view" of post-alter statements be available > > > inside of this 'dry-run' option- that is, if you add a column in the > > > transaction then the column exists to the following commands, so it > > > doesn't just error out. Having 'explain <whatever>' wouldn't give you > > > that and so you really wouldn't be able to have whole scripts run by > > > just pre-pending each command with 'explain'. > > > > That sounds extremely complex. You'd have to implement the fake > > columns, foreign keys, indexes, etc on most execution nodes, the > > planner, and even system views. > > Eh? We have MVCC catalog access. So you want to modify the catalog without actually doing the corresponding actions? That'll be heck of invasive. With changes all over the backend. We'll need to remove error checks (like for the existance of relfilenodes), remove rewriting, and such. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-10-02 13:49:36 -0300, Claudio Freire wrote: > EXPLAIN ALTER TABLE ? I don't think that'll work - there's already EXPLAIN for some CREATE. At least CREATE TABLE ... AS, CREATE VIEW ... AS and SELECT INTO. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
* Claudio Freire (klaussfreire@gmail.com) wrote: > On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost <sfrost@snowman.net> wrote: > >> That sounds extremely complex. You'd have to implement the fake > >> columns, foreign keys, indexes, etc on most execution nodes, the > >> planner, and even system views. > > > > Eh? We have MVCC catalog access. > > And that needs locks, especially if you modify the underlying filesystem layout. And we wouldn't be doing that, certainly. It's a dry-run. > >> IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs > >> locks. I don't think you can simulate the side effects without locks, > > > > Why? If you know the transaction is going to roll back and you only add > > entries to the catalog which aren't visible to any other transactions > > than your own, and you make sure that nothing you do actually writes > > data out which is visible to other transactions.. > > But that's not the scope. If you want a dry-run of table-rewriting > DDL, or DDL interspersed with DML like: > > alter table blargh add foo integer; > update blargh set foo = coalesce(bar, baz); > > You really cannot hope not to have to write data. The above is also > the case with defaulted columns btw. The point is to not write anything which is visible to other transactions, which means we'd have to put DML into some different 'mode' which doesn't actually write where other processes might be looking. I'm not saying it's trivial to do, but I don't think it's impossible either. We might also be able to simply get away with short-circuiting them and not actually writing anything (and the same for reading data..). What would probably be useful is to review actual migration scripts and see if this would really work. I know they'd work for at least a subset of the migration scripts that I've dealt with before, and also not all of them. > >> so getting the local view of changes will be extremely difficult > >> unless you limit the scope considerably. > > > > I agree that there may be complexities, but I'm not sure this is really > > the issue.. > > In essence, if you want MVCC catalog access without AEL, you're in for > a rough ride. I'm not as experienced with pg's core as you, so you > tell me, but I imagine it will be the case. It's not clear to me what you're getting at as the 'rough' part, exactly.. Thanks, Stephen
* Andres Freund (andres@2ndquadrant.com) wrote: > On 2014-10-02 17:03:59 -0400, Stephen Frost wrote: > > > That sounds extremely complex. You'd have to implement the fake > > > columns, foreign keys, indexes, etc on most execution nodes, the > > > planner, and even system views. > > > > Eh? We have MVCC catalog access. > > So you want to modify the catalog without actually doing the > corresponding actions? That'll be heck of invasive. With changes all > over the backend. We'll need to remove error checks (like for the > existance of relfilenodes), remove rewriting, and such. Yeah, I was getting at it being rather invasive earlier. It really depends on exactly what we'd support in this mode, which would depend on just what would be invasive and what wouldn't, I expect. I dislike the idea of not being able to actually run a real migration script though as anything else opens the very real possibility that the real script and the 'explain' script don't do the same thing, making this capability not nearly as useful.. Thanks, Stephen
On Thu, Oct 2, 2014 at 6:19 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Claudio Freire (klaussfreire@gmail.com) wrote: >> On Thu, Oct 2, 2014 at 6:03 PM, Stephen Frost <sfrost@snowman.net> wrote: >> >> That sounds extremely complex. You'd have to implement the fake >> >> columns, foreign keys, indexes, etc on most execution nodes, the >> >> planner, and even system views. >> > >> > Eh? We have MVCC catalog access. >> >> And that needs locks, especially if you modify the underlying filesystem layout. > > And we wouldn't be doing that, certainly. It's a dry-run. ... > (...) We might also be able to simply get away with > short-circuiting them and not actually writing anything (and the same > for reading data..). What would probably be useful is to review actual > migration scripts and see if this would really work. I know they'd work > for at least a subset of the migration scripts that I've dealt with > before, and also not all of them. I believe, for it to be reasonably intrusive, you'd have to abort at the first need to actually read/write data. Most of my migration scripts, especially the ones that would benefit from this, require some of that, but that's just my personal common practice, not a general case. >> >> IMO, dry-run per se, is a BEGIN; stuff; ROLLBACK. But that still needs >> >> locks. I don't think you can simulate the side effects without locks, >> > >> > Why? If you know the transaction is going to roll back and you only add >> > entries to the catalog which aren't visible to any other transactions >> > than your own, and you make sure that nothing you do actually writes >> > data out which is visible to other transactions.. >> >> But that's not the scope. If you want a dry-run of table-rewriting >> DDL, or DDL interspersed with DML like: >> >> alter table blargh add foo integer; >> update blargh set foo = coalesce(bar, baz); >> >> You really cannot hope not to have to write data. The above is also >> the case with defaulted columns btw. > > The point is to not write anything which is visible to other > transactions, which means we'd have to put DML into some different > 'mode' which doesn't actually write where other processes might be > looking. I'm not saying it's trivial to do, but I don't think it's > impossible either. (...) No, I don't think it's impossible either. Just very, very time-consuming. Both in developing the patch and in its eventual maintenance. TBH, a separate read-only transaction like explain alter would also be quite difficult to keep in sync with actual alter logic, unless it's handled by the same code (unlikely in that form). >> >> so getting the local view of changes will be extremely difficult >> >> unless you limit the scope considerably. >> > >> > I agree that there may be complexities, but I'm not sure this is really >> > the issue.. >> >> In essence, if you want MVCC catalog access without AEL, you're in for >> a rough ride. I'm not as experienced with pg's core as you, so you >> tell me, but I imagine it will be the case. > > It's not clear to me what you're getting at as the 'rough' part, > exactly.. A lot of work, touching most of the codebase, and hard to maintain in the end. Unless you limit the scope, as you said, just touching the catalog, not data, could be doable. But it would act as an implicit "norewrite".
Peter Geoghegan wrote: > Why is NOWAIT only supported for SET TABLESPACE? I guess it's just a > particularly bad case. NOWAIT might be the wrong thing for DDL > generally. I didn't realize we supported NOWAIT for SET TABLESPACE. My hunch is that if we have that, it should really be supported for anything that does a table rewrite, or perhaps even anything that requires a full table scan (such as adding a new CHECK constraint). OTOH it does seem a wart to have NOWAIT in alter table generally. You can get the same effect by doing a LOCK TABLE .. NOWAIT and then ALTER TABLE, right? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Dimitri Fontaine wrote: > The main practical example I can offer here is the ALTER TABLE command. > Recent releases are including very nice optimisations to it, so much so > that it's becoming increasingly hard to answer some very basic > questions: > > - what kind of locks will be taken? (exclusive, shared) > - on what objects? (foreign keys, indexes, sequences, etc) > - will the table have to be rewritten? the indexes? Please give my DDL deparsing patch a look. There is a portion there about deparsing ALTER TABLE specifically; what it does is save a list of subcommands, and for each of them we either report the OID of the object affected (for example in ADD CONSTRAINT), or a column number (for ALTER COLUMN RENAME, say). It sounds like you would like to have some extra details returned: for instance the "does the whole of it require a table rewrite" bit. It sounds like it can be trivially returned in the JSON object as a boolean (not sure if it would be attached to each individual subcommand, or at the global level), even if it doesn't play a role in the "fmt" string --- similar to how we're labelling different kinds of constraints in an ADD CONSTRAINT subcommand with the type of constraint being added. Not real sure if the functionality you propose would be just returned in the JSON object (so the user would be in charge of extracting it, or perhaps we provide additional auxiliary functions that examine the JSON), or separately. Elsewhere Pavel was complaining that JSON is not plpgsql-friendly; something to consider here too. Anyway if you're able to help the DDL deparse patch by implementing some of the missing commands (which could help you with your proposal here), that would be much welcome. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> - will the table have to be rewritten? the indexes? > > Please give my DDL deparsing patch a look. There is a portion there > about deparsing ALTER TABLE specifically; what it does is save a list of > subcommands, and for each of them we either report the OID of the object > affected (for example in ADD CONSTRAINT), or a column number (for ALTER > COLUMN RENAME, say). It sounds like you would like to have some extra > details returned: for instance the "does the whole of it require a table > rewrite" bit. It sounds like it can be trivially returned in the JSON Some years ago when working on the Event Trigger framework we did mention providing some interesting events, such as a TableRewrite Event. In between what you're saying here and what Harold and Peter Geoghegan are mentionning (basically that dealing with table rewrites is 90% of the need for them), it could be that the best way to have at it would be to add that Event in the Event Trigger mechanism. We could also add an AccessExclusiveLock Event that would fire just before actually taking the lock, allowing people to RAISE EXCEPTION in that case, or to maybe just do the LOCK … NOWAIT themselves in the trigger. For the locking parts, best would be to do the LOCK … NOWAIT dance for all the tables touched by the DDL migration script. The Event Trigger approach will not solve that, unfortunately. Regards, -- Dimitri Fontaine 06 63 07 10 78 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 10/02/2014 01:15 PM, Joe Conway wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 10/02/2014 11:30 AM, Dimitri Fontaine wrote: >> Questions: >> >> 1. Do you agree that a systematic way to report what a DDL command >> (or script, or transaction) is going to do on your production >> database is a feature we should provide to our growing user base? > > +1 > > I really like the idea and would find it useful/time-saving > >> 2. What do you think such a feature should look like? > > Elsewhere on this thread EXPLAIN was suggested. That makes a certain > amount of sense. > > Maybe something like EXPLAIN IMPACT [...] > >> 3. Does it make sense to support the whole set of DDL commands from >> the get go (or ever) when most of them are only taking locks in >> their own pg_catalog entry anyway? > > Yes, I think it should cover all commands that can have an > availability impact. In principle I agree with the sentiment. However, that full coverage is a nice goal, seldom achieved. The real question is at what level of information, returned to the user, does this feature become user friendly? It is one thing to provide information of the kind of TAKE ACCECSS EXCLUSIVE LOCK ON TABLE foo TEST EVERY ROW IN TABLE foo FOR FK (a, b) IN bar (id1, id2) That information is useful, but only to an experienced DBA who knows their schema and data to a certain degree. The majority of users, I fear, will not be able to even remotely guesstimate if that will need seconds or hours. There needs to be more detail information for those cases and I believe that tackling them one at a time in depth will lead to more useful results than trying to cover a lot but shallow. My $.02 Jan -- Jan Wieck Senior Software Engineer http://slony.info
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 10/02/2014 06:43 PM, Jan Wieck wrote: > The real question is at what level of information, returned to the > user, does this feature become user friendly? > > It is one thing to provide information of the kind of > > TAKE ACCECSS EXCLUSIVE LOCK ON TABLE foo TEST EVERY ROW IN TABLE > foo FOR FK (a, b) IN bar (id1, id2) > > That information is useful, but only to an experienced DBA who > knows their schema and data to a certain degree. The majority of > users, I fear, will not be able to even remotely guesstimate if > that will need seconds or hours. > > There needs to be more detail information for those cases and I > believe that tackling them one at a time in depth will lead to more > useful results than trying to cover a lot but shallow. Perhaps. This and other posts on this thread make me wonder if some kind of extension to the stats collector and pg_stat_statement might be a good place to start? Joe - -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIbBAEBAgAGBQJULeHwAAoJEDfy90M199hlYzQP9jJU6hs21zBZWCUUhN1159U2 Gjs8IfxxTClCUFI9Do+PyoXD2qSJKb1b6y9A6n6YXIJEzZEZfrNuuH3HvcTPMDhs yhDyAt72dyAU/udDZ2IRORQe61SBycFyi9srTyfRAf/gmUEjSdLXoZZ9JkZHhbjM 65cOmX9icRnwArPwu+FRsSilutfkW7D38s0Fao2uC+zL10acm5xkC+sTRaDJQXZm MitNsBu2IowYaEGkbNwvzV4lemCySvIyac6YzinBBpOEU32kM7gqjeFx4KLKZAfW g5/7e0DeuUYw6Y4+ghb+JIdiuPqV8FJdcV+L3z7nFs7QVh4F7IictJTJ/pxdYUzc VtfGzDAtiXIV7g0TEctoH9T2yiRe2ZEllV0yy7rgXn2Uj4mgMQDgz9QNsBPbcOdz KZ3Dey2NvoACAXiDwwzE/QKDZu1Siqfe5MnlAFPhWEOm29PiT2GF/Y7Cj6C6D84N VL+/Y4G9BuPokYQuVMyY7gP5lRPoBqBafiACki/8kIIM5pSAqen19VzSCEZTVRmd 471TqczxG3ibsfyRWEgUxW5zhnR+Se5z6FqIJPvpVbhe3OcpUhk4eQbBrSK3AnyN Vq0lJ1yWcXzbRxULeFpMmXBbv/ZC/qfLc0tRmMI0VJXYvIXJOp3p2HvKcIK6v/hh 9hil/uPVIHBagyEYZco= =0lXI -----END PGP SIGNATURE-----
* Claudio Freire (klaussfreire@gmail.com) wrote: > On Thu, Oct 2, 2014 at 6:19 PM, Stephen Frost <sfrost@snowman.net> wrote: > >> And that needs locks, especially if you modify the underlying filesystem layout. > > > > And we wouldn't be doing that, certainly. It's a dry-run. > > ... > > > (...) We might also be able to simply get away with > > short-circuiting them and not actually writing anything (and the same > > for reading data..). What would probably be useful is to review actual > > migration scripts and see if this would really work. I know they'd work > > for at least a subset of the migration scripts that I've dealt with > > before, and also not all of them. > > I believe, for it to be reasonably intrusive, you'd have to abort at > the first need to actually read/write data. > > Most of my migration scripts, especially the ones that would benefit > from this, require some of that, but that's just my personal common > practice, not a general case. I realize we're moving into real pie-in-the-sky ideas here, but I wonder if we could have a way to either create a new 'delta' fork which maintains the pages which have been changed.. Or perhaps actually have a new relfileno with a reference back to the original for any empty pages.. > > The point is to not write anything which is visible to other > > transactions, which means we'd have to put DML into some different > > 'mode' which doesn't actually write where other processes might be > > looking. I'm not saying it's trivial to do, but I don't think it's > > impossible either. (...) > > No, I don't think it's impossible either. Just very, very > time-consuming. Both in developing the patch and in its eventual > maintenance. I agree that we don't want a completely independent code path for this or it won't be as useful- and that's a concern for the EXPLAIN idea too. > TBH, a separate read-only transaction like explain alter would also be > quite difficult to keep in sync with actual alter logic, unless it's > handled by the same code (unlikely in that form). Right. Thanks, Stephen
On 10/2/14, 2:43 PM, Josh Berkus wrote: >> >Questions: >> > >> > 1. Do you agree that a systematic way to report what a DDL command (or >> > script, or transaction) is going to do on your production database >> > is a feature we should provide to our growing user base? > Yes. +1 >> > 2. What do you think such a feature should look like? > As with others, I think EXPLAIN is a good way to do this without adding > a keyword. So you'd do: > > EXPLAIN > ALTER TABLE .... I'm thinking it would be better to have something you could set at a session level, so you don't have to stick EXPLAIN infront of all your DDL. As for the dry-run idea, I don't think that's really necessary. I've never seen anyone serious that doesn't have a developmentenvironment, which is where you would simply deploy the real DDL using "verbose" mode and see what the underlyingcommands actually do.
On Fri, Oct 3, 2014 at 12:07 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > On 10/2/14, 2:43 PM, Josh Berkus wrote: >>> >>> >Questions: >>> > >>> > 1. Do you agree that a systematic way to report what a DDL command (or >>> > script, or transaction) is going to do on your production database >>> > is a feature we should provide to our growing user base? >> >> Yes. > > +1 >>> >>> > 2. What do you think such a feature should look like? >> >> As with others, I think EXPLAIN is a good way to do this without adding >> a keyword. So you'd do: >> >> EXPLAIN >> ALTER TABLE .... > > I'm thinking it would be better to have something you could set at a session > level, so you don't have to stick EXPLAIN in front of all your DDL. > > As for the dry-run idea, I don't think that's really necessary. I've never > seen anyone serious that doesn't have a development environment, which is > where you would simply deploy the real DDL using "verbose" mode and see what > the underlying commands actually do. Well, the thing that's difficult to reproduce on a development environment, is the locking issues. It may run perfect on an idle system, only to lock up forever (or unacceptably long) on a busy one. That is, probably, one concern that can be attacked relatively cheaply with some clever locking / cancelling logic.
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: >> EXPLAIN >> ALTER TABLE .... > I'm thinking it would be better to have something you could set at a session > level, so you don't have to stick EXPLAIN in front of all your DDL. Yeah I'm coming into that camp too, and I think the Event Trigger idea gets us halfway there. Here's a detailed sketched of how it would work: 1. preparatory steps: install the Event Trigger create extension norewrite; 2. test run: psql -1 -f ddl.sql ERROR: Table Rewrite has been cancelled. 3. Well actually we need to run that thing in production BEGIN; ALTER EVENT TRIGGER norewrite DISABLE; \i ddl.sql ALTER EVENT TRIGGER norewrite ENABLE; COMMIT; Then it's also possible to have another Event Trigger that would automatically issue a LOCK <table> NOWAIT; command before any DDL against a table is run, in another extension: create extension ddl_lock_nowait; The same applies, if your production rollout is blocked repeatedly and you want to force it through at some point, it's possible to disable the event trigger within the DDL script/transaction. > As for the dry-run idea, I don't think that's really necessary. I've never > seen anyone serious that doesn't have a development environment, which is > where you would simply deploy the real DDL using "verbose" mode and see what > the underlying commands actually do. The major drawback of the Event Trigger idea is that the transaction is cancelled as soon as a Rewrite Event is fired when you have installed the protective trigger. It means that you won't see the next problem after the first one, so it's not a dry-run. But considering what you're saying here, it might well be enough. Regards, -- Dimitri Fontaine 06 63 07 10 78 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Thu, Oct 2, 2014 at 5:22 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Andres Freund (andres@2ndquadrant.com) wrote: >> On 2014-10-02 17:03:59 -0400, Stephen Frost wrote: >> > > That sounds extremely complex. You'd have to implement the fake >> > > columns, foreign keys, indexes, etc on most execution nodes, the >> > > planner, and even system views. >> > >> > Eh? We have MVCC catalog access. >> >> So you want to modify the catalog without actually doing the >> corresponding actions? That'll be heck of invasive. With changes all >> over the backend. We'll need to remove error checks (like for the >> existance of relfilenodes), remove rewriting, and such. > > Yeah, I was getting at it being rather invasive earlier. It really > depends on exactly what we'd support in this mode, which would depend on > just what would be invasive and what wouldn't, I expect. I dislike the > idea of not being able to actually run a real migration script though as > anything else opens the very real possibility that the real script and > the 'explain' script don't do the same thing, making this capability not > nearly as useful.. I think this is the real issue. Somebody's got to come up with an infrastructure for reporting locks to be taken without actually performing the corresponding actions, and it's got to go through basically the same code paths as it would if we really did it. That's going to be complicated. But if we make the EXPLAIN code completely separate from the real code, it will be easy for them to get out of sync with each other, and then the answers will be wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote: > I'm thinking it would be better to have something you could set at a session level, so you don't have to stick EXPLAINin front of all your DDL. Right, I'm agreed there. > As for the dry-run idea, I don't think that's really necessary. I've never seen anyone serious that doesn't have a developmentenvironment, which is where you would simply deploy the real DDL using "verbose" mode and see what the underlyingcommands actually do. That's certainly an interesting point and perhaps what we'd do is, instead, have a "collect info on locks needed" mode- but otherwise, let everything run as-is. You could then take the report at the end of the transaction and use it to identify what would be needed in production and maybe even have a script created which grabs all the locks using 'nowait' or fails the whole thing if it isn't possible.. Of course, we kind of have that already... Just look at the locks you've acquired at the end of the transaction.. Thanks, Stephen
On 10/03/2014 11:02 AM, Dimitri Fontaine wrote: > Jim Nasby <Jim.Nasby@BlueTreble.com> writes: >>> EXPLAIN >>> ALTER TABLE .... >> I'm thinking it would be better to have something you could set at a session >> level, so you don't have to stick EXPLAIN in front of all your DDL. We were considering the potential needs of "accidental DBAs" here at first, if memory serves me well. I definitively see the value of EXPLAIN [DDL STATEMENT]... even if implemented as a regular "this is what should happen", without even attempting to run a thing (not even "dry run transactions"), but including the full catalog search / attribute resolution etc. Providing insight on the locking that would happen sounds like a real life-saver for many real life situations (i.e. would this ALTER completely lock my application due to the constant logging-to-table?) This is, obviously IMHO. > Yeah I'm coming into that camp too, and I think the Event Trigger idea > gets us halfway there. Here's a detailed sketched of how it would work: > > 1. preparatory steps: install the Event Trigger > > create extension norewrite; > > 2. test run: > > psql -1 -f ddl.sql > ERROR: Table Rewrite has been cancelled. > > 3. Well actually we need to run that thing in production > > BEGIN; > ALTER EVENT TRIGGER norewrite DISABLE; > \i ddl.sql > ALTER EVENT TRIGGER norewrite ENABLE; > COMMIT; > > Then it's also possible to have another Event Trigger that would > automatically issue a LOCK <table> NOWAIT; command before any DDL > against a table is run, in another extension: > > create extension ddl_lock_nowait; > > The same applies, if your production rollout is blocked repeatedly and > you want to force it through at some point, it's possible to disable the > event trigger within the DDL script/transaction. This serves a different purpose which is, at least, as worthwhile as the former: provide a real "dry run" mechanism for advanced users. Stephen's "delta" fork sounds like a promising approach ... even if a bit too "Oracle-ish" (sounds an awful lot like UNDO logs!) for my liking. >> As for the dry-run idea, I don't think that's really necessary. I've never >> seen anyone serious that doesn't have a development environment, which is >> where you would simply deploy the real DDL using "verbose" mode and see what >> the underlying commands actually do. > The major drawback of the Event Trigger idea is that the transaction is > cancelled as soon as a Rewrite Event is fired when you have installed > the protective trigger. It means that you won't see the next problem > after the first one, so it's not a dry-run. > > But considering what you're saying here, it might well be enough. It is a very convenient first step (minimally invasive, and good use of existing infrastructure)... since it allows an easy testing phase in order to iron out potential shortcomings and gather input on some other applications. Thanks, / Jose