Thread: pg_restore --no-post-data and --post-data-only
Attached is an undocumented patch that allows pg_restore to omit post-data items or omit all but post-data items. This has been discussed before, and Simon sent in a patch back on 2008, which has bitrotted some. I'm not sure why it was dropped at the time, but I think it's time to do this. This patch relies on some infrastructure that was added since Simon's patch, so it works a bit differently (and more simply). So with this patch, the following three sequences should be equivalent: pg_restore --no-post-data pg_restore --post-data-only pg_restore -s --no-post-data pg_restore -a pg_restore --post-data-only pg_restore This is useful and worth doing on its own, and will also add to the usefulness of the pg_dump --exclude-table-data patch in my previous email. As with that patch, a version that applies to version 9.0 and 8.4 sources is also attached, for the very eager. cheers andrew
Attachment
On 8/23/11 1:30 PM, Andrew Dunstan wrote: > > Attached is an undocumented patch that allows pg_restore to omit > post-data items or omit all but post-data items. This has been discussed > before, and Simon sent in a patch back on 2008, which has bitrotted > some. I'm not sure why it was dropped at the time, but I think it's time > to do this. This patch relies on some infrastructure that was added > since Simon's patch, so it works a bit differently (and more simply). If it's not clear from Andrew's description, the purpose of this patch is to allow dividing your pgdump into 3 portions: 1. schema 2. data 3. constraints/indexes This allows users to implement a number of custom solutions for ad-hoc parallel dump, conditional loading, data munging and sampled databases.While doing so was possible before using the manifestfrom pg_restore -l, the manifest approach has been complex to automate and relies on obscure knowledge. I have immediate production use for this patch and may be backporting it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 08/24/2011 08:43 PM, Josh Berkus wrote: > On 8/23/11 1:30 PM, Andrew Dunstan wrote: >> Attached is an undocumented patch that allows pg_restore to omit >> post-data items or omit all but post-data items. This has been discussed >> before, and Simon sent in a patch back on 2008, which has bitrotted >> some. I'm not sure why it was dropped at the time, but I think it's time >> to do this. This patch relies on some infrastructure that was added >> since Simon's patch, so it works a bit differently (and more simply). > If it's not clear from Andrew's description, the purpose of this patch > is to allow dividing your pgdump into 3 portions: > > 1. schema > 2. data > 3. constraints/indexes > > This allows users to implement a number of custom solutions for ad-hoc > parallel dump, conditional loading, data munging and sampled databases. > While doing so was possible before using the manifest from pg_restore > -l, the manifest approach has been complex to automate and relies on > obscure knowledge. > > I have immediate production use for this patch and may be backporting it. It's already backported, at least as far as 8.4. Check your email :-) cheers andrew
On Aug 24, 2011, at 7:43 PM, Josh Berkus wrote: > On 8/23/11 1:30 PM, Andrew Dunstan wrote: >> >> Attached is an undocumented patch that allows pg_restore to omit >> post-data items or omit all but post-data items. This has been discussed >> before, and Simon sent in a patch back on 2008, which has bitrotted >> some. I'm not sure why it was dropped at the time, but I think it's time >> to do this. This patch relies on some infrastructure that was added >> since Simon's patch, so it works a bit differently (and more simply). > > If it's not clear from Andrew's description, the purpose of this patch > is to allow dividing your pgdump into 3 portions: > > 1. schema > 2. data > 3. constraints/indexes > > This allows users to implement a number of custom solutions for ad-hoc > parallel dump, conditional loading, data munging and sampled databases. > While doing so was possible before using the manifest from pg_restore > -l, the manifest approach has been complex to automate and relies on > obscure knowledge. > > I have immediate production use for this patch and may be backporting it. FWIW, I got around this by writing a perl script that calls pg_dump -s and watches for the end of table create statements(IIRC it specifically looks for the first CREATE INDEX). The advantage to that approach is that you don't haveto first create a custom format dump and then run pg_restore against that. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 08/25/2011 06:05 PM, Jim Nasby wrote: > On Aug 24, 2011, at 7:43 PM, Josh Berkus wrote: >> On 8/23/11 1:30 PM, Andrew Dunstan wrote: >>> Attached is an undocumented patch that allows pg_restore to omit >>> post-data items or omit all but post-data items. This has been discussed >>> before, and Simon sent in a patch back on 2008, which has bitrotted >>> some. I'm not sure why it was dropped at the time, but I think it's time >>> to do this. This patch relies on some infrastructure that was added >>> since Simon's patch, so it works a bit differently (and more simply). >> If it's not clear from Andrew's description, the purpose of this patch >> is to allow dividing your pgdump into 3 portions: >> >> 1. schema >> 2. data >> 3. constraints/indexes >> >> This allows users to implement a number of custom solutions for ad-hoc >> parallel dump, conditional loading, data munging and sampled databases. >> While doing so was possible before using the manifest from pg_restore >> -l, the manifest approach has been complex to automate and relies on >> obscure knowledge. >> >> I have immediate production use for this patch and may be backporting it. > FWIW, I got around this by writing a perl script that calls pg_dump -s and watches for the end of table create statements(IIRC it specifically looks for the first CREATE INDEX). The advantage to that approach is that you don't haveto first create a custom format dump and then run pg_restore against that. > Well, notwithstanding my well known love of perl, that strikes me as spending a pound to save a penny. And custom format dumps rock ;-) Also, your recipe above is buggy, BTW. A CREATE INDEX statement might well not be the first item in the post-data section. But we could also add these switches to pg_dump too if people feel it's worthwhile. I haven't looked but the logic should not be terribly hard. cheers andrew
On 08/25/2011 06:15 PM, Andrew Dunstan wrote: > > > But we could also add these switches to pg_dump too if people feel > it's worthwhile. I haven't looked but the logic should not be terribly > hard. > Something like the attached, in fact, which seems pretty simple. cheers andrew
Attachment
On Fri, Aug 26, 2011 at 11:22 AM, Andrew Dunstan <andrew@dunslane.net> wrote: >> But we could also add these switches to pg_dump too if people feel it's >> worthwhile. I haven't looked but the logic should not be terribly hard. > > Something like the attached, in fact, which seems pretty simple. It seems like there are three sets of things you might want here: pre-data, data, post-data. So in the end we could end up with: --pre-data-only --post-data-only --data-only --no-pre-data --no-post-data --no-data And then maybe someone will want just the create index commands and not the constraint commands. It seems like it might be more elegant to come up with a single switch where you can list which things you want: --sections='predata data' --sections='postdata' --sections='index' Just thinking out loud.... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, 2011-08-26 at 12:46 -0400, Robert Haas wrote: > --sections='predata data' > --sections='postdata' > --sections='index' Agreed. After command line options reach a certain level of complexity, I think it's worth looking for a more general way to express them. Regards,Jeff Davis
On 08/26/2011 12:46 PM, Robert Haas wrote: > On Fri, Aug 26, 2011 at 11:22 AM, Andrew Dunstan<andrew@dunslane.net> wrote: >>> But we could also add these switches to pg_dump too if people feel it's >>> worthwhile. I haven't looked but the logic should not be terribly hard. >> Something like the attached, in fact, which seems pretty simple. > It seems like there are three sets of things you might want here: > pre-data, data, post-data. So in the end we could end up with: > > --pre-data-only > --post-data-only > --data-only > --no-pre-data > --no-post-data > --no-data > > And then maybe someone will want just the create index commands and > not the constraint commands. It seems like it might be more elegant > to come up with a single switch where you can list which things you > want: > > --sections='predata data' > --sections='postdata' > --sections='index' > > Just thinking out loud.... I knew there would be some bike-shedding about how we specify these things, which is why I haven't written docs yet. All the possibilities you specify except for the indexes section can be done by using these switches in combination with -s and -a. For anything more fine-grained, I'm inclined to say that people need to roll their own. pg_restore's --list and --use-list give you extremely fine-grained control. I have working scripts which use these for example to filter out londiste and pgq objects, certain large tables, audit objects and more. As an example of the complexity I think we should avoid, which section would UNIQUE and PRIMARY KEY constraints belong in? "constraints" because that's what they are, or "indexes" because that's what they create? No matter which answer you choose someone will claim you have violated POLA. Chopping things into pre-data, data and post-data would get us around 99% of the cases we could reasonably provide for in my experience. That seems enough :-) I don't have anything in principle against your '--sections="foo bar"' suggestion, but it would be more work to program. Simpler, and probably more consistent with how we do other things, would be allowing multiple --section options, if we don't want to have named options such as I have provided. cheers andrew
On Fri, Aug 26, 2011 at 1:15 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > I don't have anything in principle against your '--sections="foo bar"' > suggestion, but it would be more work to program. Simpler, and probably more > consistent with how we do other things, would be allowing multiple --section > options, if we don't want to have named options such as I have provided. I wouldn't object to that, but "more work to program" probably means about an extra 10 lines of code in this particular case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Excerpts from Robert Haas's message of vie ago 26 15:36:36 -0300 2011: > On Fri, Aug 26, 2011 at 1:15 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > I don't have anything in principle against your '--sections="foo bar"' > > suggestion, but it would be more work to program. Simpler, and probably more > > consistent with how we do other things, would be allowing multiple --section > > options, if we don't want to have named options such as I have provided. > > I wouldn't object to that, but "more work to program" probably means > about an extra 10 lines of code in this particular case. The "--section=data --section=indexes" proposal seems very reasonable to me -- more so than "--sections='data indexes'". -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Aug 26, 2011, at 12:15 PM, Andrew Dunstan wrote: > I knew there would be some bike-shedding about how we specify these things, which is why I haven't written docs yet. While we're debating what shade of yellow to paint the shed... My actual use case is to be able to be able to "inject" SQL into a SQL-formatted dump either pre- or post-data (I'm on 8.3,so I don't actually dump any data; I'm *mostly* emulating the ability to dump data on just certain tables). So for what I'm doing, the ideal interface would be a way to tell pg_dump "When you're done dumping all table structuresbut before you get to any constraints, please run $COMMAND and inject it's output into the dump output." For someof the data obfuscation we're doing it would be easiest if $COMMAND was a perl script instead of SQL, but we could probablyconvert it. Of course, many other folks actually need the ability to just spit out specific portions of the dump; I'm hoping we can comeup with something that supports both concepts. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Alvaro Herrera <alvherre@commandprompt.com> writes: > The "--section=data --section=indexes" proposal seems very reasonable to > me -- more so than "--sections='data indexes'". +1 ... not only easier to code and less squishily defined, but more like the existing precedent for other pg_dump switches, such as --table. regards, tom lane
On 08/26/2011 04:46 PM, Jim Nasby wrote: > On Aug 26, 2011, at 12:15 PM, Andrew Dunstan wrote: >> I knew there would be some bike-shedding about how we specify these things, which is why I haven't written docs yet. > While we're debating what shade of yellow to paint the shed... > > My actual use case is to be able to be able to "inject" SQL into a SQL-formatted dump either pre- or post-data (I'm on8.3, so I don't actually dump any data; I'm *mostly* emulating the ability to dump data on just certain tables). > > So for what I'm doing, the ideal interface would be a way to tell pg_dump "When you're done dumping all table structuresbut before you get to any constraints, please run $COMMAND and inject it's output into the dump output." For someof the data obfuscation we're doing it would be easiest if $COMMAND was a perl script instead of SQL, but we could probablyconvert it. > > Of course, many other folks actually need the ability to just spit out specific portions of the dump; I'm hoping we cancome up with something that supports both concepts. > Well, the Unix approach is to use tools that do one thing well to build up more complex tools. Making pg_dump run some external command to inject things into the stream seems like the wrong thing given this philosophy. Use pg_dump to get the bits you want (pre-data, post-data) and sandwich them around whatever else you want. As for getting data from just certain tables, I just posted a patch for pg_dump to exclude data for certain tables, and we could look at providing a positive as well as a negative filter if there is sufficient demand. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > For anything more fine-grained, I'm inclined to say that people need to roll > their own. pg_restore's --list and --use-list give you extremely > fine-grained control. I have working scripts which use these for example to > filter out londiste and pgq objects, certain large tables, audit objects and Which is exactly the core features of pg_staging, that builds schema whitelist and schema_nodata options on top of pg_restore listing. The only complex thing here is to be able to filter out triggers using a function defined in a schema you're filtering out, but pg_staging has support for that. http://tapoueh.org/pgsql/pgstaging.html https://github.com/dimitri/pg_staging http://tapoueh.org/blog/2011/03/29-towards-pg_staging-10.html And you can also only use the pg_restore listing commands of pg_staging without having to do the full installation of its features. Will write some article about how to use it for only catalog listing purpose, without its infrastructure for fetching backups and managing dev staging environments. > I don't have anything in principle against your '--sections="foo bar"' > suggestion, but it would be more work to program. Simpler, and probably more > consistent with how we do other things, would be allowing multiple --section > options, if we don't want to have named options such as I have provided. +1 for --section foo --section bar. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Well, notwithstanding my well known love of perl, that strikes me as > spending a pound to save a penny. And custom format dumps rock ;-) Also, > your recipe above is buggy, BTW. A CREATE INDEX statement might well not > be the first item in the post-data section. > > But we could also add these switches to pg_dump too if people feel it's > worthwhile. I haven't looked but the logic should not be terribly hard. A big +1 to --pre-data and --post-data, but until we get there, or if you have an existing dump file (schema *or* schema+data) that needs parsing, there is an existing tool: http://blog.endpoint.com/2010/01/splitting-postgres-pgdump-into-pre-and.html Once these new flags and the ability to custom format dump pg_dumpall is done, I'll have very little left to complain about with pg_dump :) - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201108271855 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAk5ZdfwACgkQvJuQZxSWSsipDQCgpmNtD/I/2gfAzm2b3jouD8nS qhgAn33t5VLiF8HeslBwCqyMzQJy6VN5 =PfK7 -----END PGP SIGNATURE-----
On 08/27/2011 06:56 PM, Greg Sabino Mullane wrote: > > Once these new flags and the ability to custom format dump pg_dumpall > is done, I'll have very little left to complain about with pg_dump :) > > It's off topic. But I think custom format would require a major mangling to be able to handle a complete cluster. This isn't just a simple matter of programming, IMNSHO. cheers andrew
On Aug 26, 2011, at 5:23 PM, Andrew Dunstan wrote: > On 08/26/2011 04:46 PM, Jim Nasby wrote: >> On Aug 26, 2011, at 12:15 PM, Andrew Dunstan wrote: >>> I knew there would be some bike-shedding about how we specify these things, which is why I haven't written docs yet. >> While we're debating what shade of yellow to paint the shed... >> >> My actual use case is to be able to be able to "inject" SQL into a SQL-formatted dump either pre- or post-data (I'm on8.3, so I don't actually dump any data; I'm *mostly* emulating the ability to dump data on just certain tables). >> >> So for what I'm doing, the ideal interface would be a way to tell pg_dump "When you're done dumping all table structuresbut before you get to any constraints, please run $COMMAND and inject it's output into the dump output." For someof the data obfuscation we're doing it would be easiest if $COMMAND was a perl script instead of SQL, but we could probablyconvert it. >> >> Of course, many other folks actually need the ability to just spit out specific portions of the dump; I'm hoping we cancome up with something that supports both concepts. >> > > Well, the Unix approach is to use tools that do one thing well to build up more complex tools. Making pg_dump run someexternal command to inject things into the stream seems like the wrong thing given this philosophy. Use pg_dump to getthe bits you want (pre-data, post-data) and sandwich them around whatever else you want. I agree... except for one little niggling concern: If pg_dump is injecting something, then the DDL is being grabbed witha single, consistent snapshot. --pre and --post do not get you that (though we could probably use the new ability toexport snapshots to fix that...) > As for getting data from just certain tables, I just posted a patch for pg_dump to exclude data for certain tables, andwe could look at providing a positive as well as a negative filter if there is sufficient demand. Unfortunately some of the dumped data needs to be sanitized, so that won't work unless I can also dump an arbitrary SELECT.But yes, a positive filter would definitely be welcome. -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Excerpts from Jim Nasby's message of mié ago 31 16:45:59 -0300 2011: > On Aug 26, 2011, at 5:23 PM, Andrew Dunstan wrote: > > On 08/26/2011 04:46 PM, Jim Nasby wrote: > >> On Aug 26, 2011, at 12:15 PM, Andrew Dunstan wrote: > >>> I knew there would be some bike-shedding about how we specify these things, which is why I haven't written docs yet. > >> While we're debating what shade of yellow to paint the shed... > >> > >> My actual use case is to be able to be able to "inject" SQL into a SQL-formatted dump either pre- or post-data (I'mon 8.3, so I don't actually dump any data; I'm *mostly* emulating the ability to dump data on just certain tables). > >> > >> So for what I'm doing, the ideal interface would be a way to tell pg_dump "When you're done dumping all table structuresbut before you get to any constraints, please run $COMMAND and inject it's output into the dump output." For someof the data obfuscation we're doing it would be easiest if $COMMAND was a perl script instead of SQL, but we could probablyconvert it. > >> > >> Of course, many other folks actually need the ability to just spit out specific portions of the dump; I'm hoping wecan come up with something that supports both concepts. > >> > > > > Well, the Unix approach is to use tools that do one thing well to build up more complex tools. Making pg_dump run someexternal command to inject things into the stream seems like the wrong thing given this philosophy. Use pg_dump to getthe bits you want (pre-data, post-data) and sandwich them around whatever else you want. > > I agree... except for one little niggling concern: If pg_dump is injecting something, then the DDL is being grabbed witha single, consistent snapshot. --pre and --post do not get you that (though we could probably use the new ability toexport snapshots to fix that...) Eh, --pre and --post are pg_restore flags, so you already have a consistent snapshot. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 08/31/2011 04:03 PM, Alvaro Herrera wrote: >>> >>> Well, the Unix approach is to use tools that do one thing well to build up more complex tools. Making pg_dump run someexternal command to inject things into the stream seems like the wrong thing given this philosophy. Use pg_dump to getthe bits you want (pre-data, post-data) and sandwich them around whatever else you want. >> I agree... except for one little niggling concern: If pg_dump is injecting something, then the DDL is being grabbed witha single, consistent snapshot. --pre and --post do not get you that (though we could probably use the new ability toexport snapshots to fix that...) > Eh, --pre and --post are pg_restore flags, so you already have a > consistent snapshot. > We've been talking about adding them for pg_dump too. I take Jim's point about the snapshot, but I still don't feel it's a good reason to allow some arbitrary code or script to be run between them (and after all, it's not likely to run with the same snapshot anyway). cheers andrew
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > It's off topic. But I think custom format would require a major mangling > to be able to handle a complete cluster. This isn't just a simple matter > of programming, IMNSHO. Oh, I meant just having it create separate custom format files for each database. As shell scripts all over the world have been doing for years, but it would be nice if it was simply built in. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201109012139 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAk5gM+oACgkQvJuQZxSWSsi+xgCfbr0q+Ilbw0JRsORLZN2pSz1r JtcAoJaleZvW/wWtU83d9MVeOes4I6+0 =VqFQ -----END PGP SIGNATURE-----
"Greg Sabino Mullane" <greg@turnstep.com> writes: >> It's off topic. But I think custom format would require a major mangling >> to be able to handle a complete cluster. This isn't just a simple matter >> of programming, IMNSHO. > > Oh, I meant just having it create separate custom format files for each > database. As shell scripts all over the world have been doing for years, > but it would be nice if it was simply built in. +1 -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 09/01/2011 09:40 PM, Greg Sabino Mullane wrote: >> It's off topic. But I think custom format would require a major mangling >> to be able to handle a complete cluster. This isn't just a simple matter >> of programming, IMNSHO. > Oh, I meant just having it create separate custom format files for each > database. As shell scripts all over the world have been doing for years, > but it would be nice if it was simply built in. > > I guess it could be done, although I'm not going to do it :-) I'm more about making somewhat hard things easier than easy things slightly easier :-) You'd have to invent some sort of way to name files, possibly by supplying a template to the -f parameter which would fill in some placeholder, say a %, with the name of the database. Of course, then you'd have to make sure the database name didn't contain any forbidden characters. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: >> Oh, I meant just having it create separate custom format files for each >> database. As shell scripts all over the world have been doing for years, >> but it would be nice if it was simply built in. > > I guess it could be done, although I'm not going to do it :-) I'm more about > making somewhat hard things easier than easy things slightly easier :-) Then what about issuing an archive (tar or ar format here) containing one custom file per database plus the globals file, SQL, plus maybe a database listing, and hacking pg_restore so that it knows what to do with such an input ? Bonus points if that supports the current -l and -L options, of course. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 09/03/2011 04:49 PM, Dimitri Fontaine wrote: > Andrew Dunstan<andrew@dunslane.net> writes: >>> Oh, I meant just having it create separate custom format files for each >>> database. As shell scripts all over the world have been doing for years, >>> but it would be nice if it was simply built in. >> I guess it could be done, although I'm not going to do it :-) I'm more about >> making somewhat hard things easier than easy things slightly easier :-) > Then what about issuing an archive (tar or ar format here) containing > one custom file per database plus the globals file, SQL, plus maybe a > database listing, and hacking pg_restore so that it knows what to do > with such an input ? > > Bonus points if that supports the current -l and -L options, of course. > > That's probably a lot of code for a little benefit, at least from my POV, but others might find it useful. cheers andrew
On 08/26/2011 05:11 PM, Tom Lane wrote: > Alvaro Herrera<alvherre@commandprompt.com> writes: >> The "--section=data --section=indexes" proposal seems very reasonable to >> me -- more so than "--sections='data indexes'". > +1 ... not only easier to code and less squishily defined, but more like > the existing precedent for other pg_dump switches, such as --table. > > Here is a patch for that for pg_dump. The sections provided for are pre-data, data and post-data, as discussed elsewhere. I still feel that anything finer grained should be handled via pg_restore's --use-list functionality. I'll provide a patch to do the same switch for pg_restore shortly. Adding to the commitfest. cheers andrew
Attachment
Hi Andrew, On 13/11/2011 02:56, Andrew Dunstan wrote: > Here is a patch for that for pg_dump. The sections provided for are > pre-data, data and post-data, as discussed elsewhere. I still feel that > anything finer grained should be handled via pg_restore's --use-list > functionality. I'll provide a patch to do the same switch for pg_restore > shortly. > > Adding to the commitfest. FWIW, I've tested the patch as I've recently needed to build a custom splitting script for a project and the patch seemed to be a much more elegant solution. As far as I can tell, it works great and the output matches the result of my script. The only little thing I've noticed is a missing ending ")" in the --help message. Cheers -- Matteo Beccati Development & Consulting - http://www.beccati.com/
On Sat, November 12, 2011 8:56 pm, Andrew Dunstan wrote: > > > On 08/26/2011 05:11 PM, Tom Lane wrote: >> Alvaro Herrera<alvherre@commandprompt.com> writes: >>> The "--section=data --section=indexes" proposal seems very reasonable >>> to >>> me -- more so than "--sections='data indexes'". >> +1 ... not only easier to code and less squishily defined, but more like >> the existing precedent for other pg_dump switches, such as --table. >> >> > > > Here is a patch for that for pg_dump. The sections provided for are > pre-data, data and post-data, as discussed elsewhere. I still feel that > anything finer grained should be handled via pg_restore's --use-list > functionality. I'll provide a patch to do the same switch for pg_restore > shortly. > > Adding to the commitfest. > Updated version with pg_restore included is attached. cheers andrew
Attachment
> > Here is a patch for that for pg_dump. The sections provided for are > > pre-data, data and post-data, as discussed elsewhere. I still feel that > > anything finer grained should be handled via pg_restore's --use-list > > functionality. I'll provide a patch to do the same switch for pg_restore > > shortly. > > > > Adding to the commitfest. > > > > > Updated version with pg_restore included is attached. Functionality review: I have tested the backported version of this patch using a 500GB production database with over 200 objects and it workedas specified. This functionality is extremely useful for the a variety of selective copying of databases, including creating shrunken testinstances, ad-hoc parallel dump, differently indexed copies, and sanitizing copies of sensitive data, and even bringingthe database up for usage while the indexes are still building. Note that this feature has the odd effect that some constraints are loaded at the same time as the tables and some are loadedwith the post-data. This is consistent with how text-mode pg_dump has always worked, but will seem odd to the user. This also raises the possibility of a future pg_dump/pg_restore optimization. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com San Francisco
On Tue, Nov 15, 2011 at 8:19 PM, Joshua Berkus <josh@agliodbs.com> wrote: >> > Here is a patch for that for pg_dump. The sections provided for are >> > pre-data, data and post-data, as discussed elsewhere. I still feel that >> > anything finer grained should be handled via pg_restore's --use-list >> > functionality. I'll provide a patch to do the same switch for pg_restore >> > shortly. >> > >> > Adding to the commitfest. >> >> Updated version with pg_restore included is attached. > > Functionality review: > > I have tested the backported version of this patch using a 500GB production database with over 200 objects and it workedas specified. > > This functionality is extremely useful for the a variety of selective copying of databases, including creating shrunkentest instances, ad-hoc parallel dump, differently indexed copies, and sanitizing copies of sensitive data, and evenbringing the database up for usage while the indexes are still building. > > Note that this feature has the odd effect that some constraints are loaded at the same time as the tables and some areloaded with the post-data. This is consistent with how text-mode pg_dump has always worked, but will seem odd to theuser. This also raises the possibility of a future pg_dump/pg_restore optimization. That does seem odd. Why do we do it that way? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>> Note that this feature has the odd effect that some constraints are loaded at the same time as the tables and some areloaded with the post-data. This is consistent with how text-mode pg_dump has always worked, but will seem odd to theuser. This also raises the possibility of a future pg_dump/pg_restore optimization. > > That does seem odd. Why do we do it that way? Beats me. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: >>> Note that this feature has the odd effect that some constraints are loaded at the same time as the tables and some areloaded with the post-data. This is consistent with how text-mode pg_dump has always worked, but will seem odd to theuser. This also raises the possibility of a future pg_dump/pg_restore optimization. >> That does seem odd. Why do we do it that way? > Beats me. Performance, mostly --- we prefer to apply checks during the original data load if possible, but for indexes and FK constraints it's faster to apply them later. Also, we can separate constraints from the original table declaration if it's necessary to break a reference circularity. This isn't something that would be wise to whack around. regards, tom lane
On 12/07/2011 11:31 AM, Tom Lane wrote: > Josh Berkus<josh@agliodbs.com> writes: >>>> Note that this feature has the odd effect that some constraints are loaded at the same time as the tables and some areloaded with the post-data. This is consistent with how text-mode pg_dump has always worked, but will seem odd to theuser. This also raises the possibility of a future pg_dump/pg_restore optimization. >>> That does seem odd. Why do we do it that way? >> Beats me. > Performance, mostly --- we prefer to apply checks during the original > data load if possible, but for indexes and FK constraints it's faster to > apply them later. Also, we can separate constraints from the original > table declaration if it's necessary to break a reference circularity. > This isn't something that would be wise to whack around. > > Yeah, and if we did want to change it that should be a TODO and not hold up this feature. cheers andrew
On Tue, Nov 15, 2011 at 6:14 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Updated version with pg_restore included is attached. The patch applies with some fuzz by now but compiles without errors or warnings. The feature just works, it is not adding a lot of new code, basically it parses the given options and then skips over steps depending on the selected section. I verified the equivalence of -a and -s to the respective sections in the different archive formats and no surprise here either, they were equivalent except for the header (which has a timestamp). If you ask pg_restore to restore a section out of an archive which doesn't have this section, there is no error and the command just succeeds. This is what I expected and I think it's the right thing to do but maybe others think that there should be a warning. In pg_restore, pre-data cannot be run in parallel, it would only run serially, data and post-data can run in parallel, though. This is also what I had expected but it might be worth to add a note about this to the documentation. What I didn't like about the implementation was the two set_section() functions, I'd prefer them to move to a file that is shared between pg_dump and pg_restore and become one function... Minor issues: {"section", required_argument, NULL, 5} in pg_dump.c is not in the alphabetical order of the options. ./pg_restore --section=foobar pg_restore: unknown section name "foobar") Note the trailing ')', it's coming from a _(...) confusion Some of the lines in the patch have trailing spaces and in the documentation part tabs and spaces are mixed. int skip used as bool skip in dumpDumpableObject() Joachim
On 12/8/11 9:18 PM, Joachim Wieland wrote: > If you ask pg_restore to restore a section out of an archive which > doesn't have this section, there is no error and the command just > succeeds. This is what I expected and I think it's the right thing to > do but maybe others think that > there should be a warning. Andrew and I discussed this previously. It's consistent with how we treat other options in pg_restore. It may be that we should be consistently treating all options differently, but I don't think that's specific to this patch. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 12/08/2011 09:18 PM, Joachim Wieland wrote: > On Tue, Nov 15, 2011 at 6:14 PM, Andrew Dunstan<andrew@dunslane.net> wrote: >> Updated version with pg_restore included is attached. > The patch applies with some fuzz by now but compiles without errors or warnings. > > The feature just works, it is not adding a lot of new code, basically > it parses the given options and then skips over steps depending on the > selected section. > > I verified the equivalence of -a and -s to the respective sections in > the different archive formats and no surprise here either, they were > equivalent except for the header (which has a timestamp). > > If you ask pg_restore to restore a section out of an archive which > doesn't have this section, there is no error and the command just > succeeds. This is what I expected and I think it's the right thing to > do but maybe others think that > there should be a warning. > > In pg_restore, pre-data cannot be run in parallel, it would only run > serially, data and post-data can run in parallel, though. This is also > what I had expected but it might be worth to add a note about this to > the documentation. This is true now of parallel restore, and is by design (see debates from the time.) > What I didn't like about the implementation was the two set_section() > functions, I'd prefer them to move to a file that is shared between > pg_dump and pg_restore and become one function... Done > > Minor issues: > > {"section", required_argument, NULL, 5} in pg_dump.c is not in the alphabetical > order of the options. > > ./pg_restore --section=foobar > pg_restore: unknown section name "foobar") > > Note the trailing ')', it's coming from a _(...) confusion > > Some of the lines in the patch have trailing spaces and in the > documentation part tabs and spaces are mixed. > > int skip used as bool skip in dumpDumpableObject() > > Should all be fixed. Revised patch attached. cheers andrew
Attachment
On 12/12/2011 04:35 PM, Andrew Dunstan wrote: > Should all be fixed. Revised patch attached. There were two successful test results here and only minor things noted to fix, which are all cleaned up now. This seems ready for a committer now; I'm just now sure if you want to do it yourself or have someone else take a last look over it instead. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
On 12/16/2011 02:43 PM, Greg Smith wrote: > On 12/12/2011 04:35 PM, Andrew Dunstan wrote: >> Should all be fixed. Revised patch attached. > > There were two successful test results here and only minor things > noted to fix, which are all cleaned up now. This seems ready for a > committer now; I'm just now sure if you want to do it yourself or have > someone else take a last look over it instead. I'll do it myself if nobody else wants to comment. cheers andrew