Thread: pg_dump and pg_dumpall in real life
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello After some discussions in #pgconfeu, this is an attempt to relaunch the discussion about how pg_dump and pg_dumpall work and the challenges they give us in real life. We have got bitten sometimes because of their behavior and we can see it is a difficult subject for new postgres users even if they have long experience with others databases. Logical backups are used for restores or cloning purposes. If as a database administrator you want to be able to do one of these procedures for a database, you need to do this in advance today: 1) Use pg_dump to dump schema, data and privileges of the database. 2) Use pg_dumpall -g to dump global objects 3) Use pg_dumpall -g to dump ALTER ROLE ... SET ... data 4) Use pg_dumpall to dump possible ALTER DATABASE ... SET ... data For a cloning procedure we need the samme steps but usually and in addition we have to change the name of the owner/database when importing the dumps. If you have just a few and not very complicated databases in your cluster, these steps will not be very complicated although very irritating. Imagine you have several hundred databases in your cluster, with several hundred users owning some objects and with grants in others. Imagine you are cloning or restoring only one or a few of these databases to another server. For 2), 3) and 4) you will have to parse the output from pg_dumpall to get the few global objects, ALTER ROLE and ALTER DATABASE data for the few databases you are restoring. In addition, if you have used "GRANT .. ON .. TO ..." in your database objects you will have to take care of this manually to find out who has extra privileges in your objects, so you can also get the right information from pg_dumpall -g. You don't need a lot of imagination to understand what a mess this can be when moving data around. Not to talk about the possibility of doing something wrong in the process and not be a very robust solution. After many years of using pg_dump/pg_dumpall, this is our experience, our wishes and thoughts: * We need a pg_dump solution that can generate in one step all the necessary pieces of information needed when restoring or cloning a database. (schema, data, privileges, users and alter database/role data) * It would be great to be able to tell pg_restore that user1 in the dump will became user2 in the restored/cloned database. The same for the name of the database. * For serious backup management of large and complicated databases, pg_dump with the custom output + pg_restore is the only feasible solution. What do you think about the subject? Does it sound like a reasonable proposition? What do we need to implement some of these changes? Thanks in advance for your time. Some background information: Ref: http://wiki.postgresql.org/wiki/Todo http://www.postgresql.org/message-id/4864F001.50909@archonet.com http://www.postgresql.org/message-id/11646.1272814212@sss.pgh.pa.us regards, - -- Rafael Martinez GuerreroCenter for Information TechnologyUniversity of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAlKA4q4ACgkQBhuKQurGihSJJACglhZnjSTGFvzz6Rl0Vhrl3BrY gssAni2l7kOQFxzr6IlDHAd0oMryDkT5 =Ti6V -----END PGP SIGNATURE-----
* Rafael Martinez (r.m.guerrero@usit.uio.no) wrote: > * We need a pg_dump solution that can generate in one step all the > necessary pieces of information needed when restoring or cloning a > database. (schema, data, privileges, users and alter database/role data) This sounds pretty reasonable and should be possible to implement- but realize that, on the restore side, you might end up with multiple attempts to create the same objects. Consider a role that's depended upon by objects in two databases- it would be included in the dump of both of those databases and if you restored both of those into the same cluster, one of the CREATE ROLE statements would fail. I'd think this would also be a new pg_dump option along the lines of 'include global dependencies' or similar. Reading the older threads, I also agree that a '--create' version of pg_dump should include the various SET commands for the database to be configured the same as the one being dump'd. The next part seems simple- let's get someone to do it.. :) > * It would be great to be able to tell pg_restore that user1 in the > dump will became user2 in the restored/cloned database. The same for > the name of the database. This is a lot uglier, unfortunately. We've seen this multiple times before- there's not a good way to provide such a mapping as a command line option. There may also be issues with the dependency resolution.. > * For serious backup management of large and complicated databases, > pg_dump with the custom output + pg_restore is the only feasible solution. Sure; is there a question here? I don't think that means we're going to change the default, though there is a whole other thread on that subject. > What do you think about the subject? Does it sound like a reasonable > proposition? What do we need to implement some of these changes? As for 'what we need', I'd think someone with the time and energy to write the patch and work with the community to implement it.. Thanks, Stephen
On 11/11/2013 08:59 AM, Rafael Martinez wrote: > > * It would be great to be able to tell pg_restore that user1 in the > dump will became user2 in the restored/cloned database. The same for > the name of the database. A general ability to rename things would be good. In particular, restoring schema x into schema y or table x into table y would be very useful, especially if you need to be able to compare old with new. Unfortunately, this would involve a fairly significant change in the design of pg_dump / pg_restore. The stored SQL is currently fairly opaque, and a renaming scheme would probably need to implement instead some sort of placeholder mechanism. That would mean a LOT of work. cheers andrew
On 11/11/2013 06:24 AM, Stephen Frost wrote: > * Rafael Martinez (r.m.guerrero@usit.uio.no) wrote: >> * We need a pg_dump solution that can generate in one step all the >> necessary pieces of information needed when restoring or cloning a >> database. (schema, data, privileges, users and alter database/role data) > > This sounds pretty reasonable and should be possible to implement- but > realize that, on the restore side, you might end up with multiple > attempts to create the same objects. Consider a role that's depended > upon by objects in two databases- it would be included in the dump of > both of those databases and if you restored both of those into the same > cluster, one of the CREATE ROLE statements would fail. Well, then we just need pg_restore to handle the "role already exists" error message gracefully. That's all. Or a "CREATE ROLE IF NOT EXISTS" statement, and use that for roles. >> * It would be great to be able to tell pg_restore that user1 in the >> dump will became user2 in the restored/cloned database. The same for >> the name of the database. > > This is a lot uglier, unfortunately. We've seen this multiple times > before- there's not a good way to provide such a mapping as a command > line option. There may also be issues with the dependency resolution.. This sounds like one of the reasons we still *have* text-mode dumps. For stuff like this. > As for 'what we need', I'd think someone with the time and energy to > write the patch and work with the community to implement it.. +1 -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Andrew Dunstan wrote > A general ability to rename things would be good. In particular, > restoring schema x into schema y or table x into table y would be very > useful, especially if you need to be able to compare old with new. compare old and new what? I would imagine that schema comparisons would be much easier if the only thing that is different is the database name and you compare database "old" to database "new". Are there any existing threads or posts, that you recollect, that detail solid use-cases for "clone-and-rename" mechanics? I don't seem to recall anything in the past year or so but my coverage is probably only about 70% in that timeframe. SQL seems particularly unfriendly to renaming and runtime name resolution in general (largely due to caching effects). Some kind of alias mechanism makes sense conceptually but the performance hit for such isn't likely to be worth incurring. I could see having table name aliases so that raw data in a dump from one database could be restored into another but I'd likely require that the user be able to generate the target schema from source themselves. That would facilitate the use-case where the DBA/programmer is able to fully recreate their schema from source and only require that actual data be restored into the newly created database. I can see where grants may fall into a grey middle-area but functions/view/triggers and the like would need to be synchronized with any schema naming changes and that should, IMO, be driven from source and not facilitated by a dump/restore process. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-pg-dumpall-in-real-life-tp5777718p5777816.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Josh Berkus wrote > Well, then we just need pg_restore to handle the "role already exists" > error message gracefully. That's all. Or a "CREATE ROLE IF NOT EXISTS" > statement, and use that for roles. My only qualm here is if the exists check is based off of role name only. If database "A" and database "B" came from different clusters but both have a role "david" the actual identity of "david" is (could be) different because the source cluster. The risk of such occurring is a high-security situation is likely to be small but some kind of "--ignore-different-cluster-same-role" flag may be worthwhile such that pg_restore will error unless that flag is set (i.e., high security by default). The error itself should be rare enough most people wouldn't even notice it is there but seeing such an error (with a hint provided as well) would be easily able to disable and continue on with the restore. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-pg-dumpall-in-real-life-tp5777718p5777823.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 11/11/2013 03:06 PM, David Johnston wrote: > Josh Berkus wrote >> Well, then we just need pg_restore to handle the "role already exists" >> error message gracefully. That's all. Or a "CREATE ROLE IF NOT EXISTS" >> statement, and use that for roles. > > My only qualm here is if the exists check is based off of role name only. > If database "A" and database "B" came from different clusters but both have > a role "david" the actual identity of "david" is (could be) different > because the source cluster. > > The risk of such occurring is a high-security situation is likely to be > small but some kind of "--ignore-different-cluster-same-role" flag may be > worthwhile such that pg_restore will error unless that flag is set (i.e., > high security by default). The error itself should be rare enough most > people wouldn't even notice it is there but seeing such an error (with a > hint provided as well) would be easily able to disable and continue on with > the restore. I'd do the opposite: let's optimize for the most common case, not the least common one. So we'd do --role-errors, which would throw a fatal error on duplicate roles, instead of just posting a WARNING. Again, this is all rather academic, unless you know someone who's eager to dig into pg_dump/pg_restore. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/11/2013 05:50 PM, David Johnston wrote: > Andrew Dunstan wrote >> A general ability to rename things would be good. In particular, >> restoring schema x into schema y or table x into table y would be very >> useful, especially if you need to be able to compare old with new. > compare old and new what? Data is what I had in mind. There have been plenty of times when I've been asked ex post to find out what's changed in some table in the last 24 hours or something like that, and all I've had to work with is yesterday's dump file. The handsprings you have to turn in order to get the old version of the table and the new version side by side make it painful - it would be nice to be able to say "restore this table but with that name," or "restore this table but into that schema". cheers andrew
<div dir="ltr"><div class="gmail_extra"><br />On Mon, Nov 11, 2013 at 8:20 PM, Josh Berkus <<a href="mailto:josh@agliodbs.com">josh@agliodbs.com</a>>wrote:<br />><br />><br />> [...]<br />><br />> Well,then we just need pg_restore to handle the "role already exists"<br /> > error message gracefully. That's all. Ora "CREATE ROLE IF NOT EXISTS"<br />> statement, and use that for roles.<br />><br /><br />I'm working in a patchto add IF NOT EXISTS for all CREATE statements,<br /> including of course the CREATE ROLE statement.<br /> <br /></div><divclass="gmail_extra">Regards,<br /><br />--<br />Fabrízio de Royes Mello<br />Consultoria/Coaching PostgreSQL<br/>>> Timbira: <a href="http://www.timbira.com.br">http://www.timbira.com.br</a><br /> >> Blog sobreTI: <a href="http://fabriziomello.blogspot.com">http://fabriziomello.blogspot.com</a><br />>> Perfil Linkedin:<a href="http://br.linkedin.com/in/fabriziomello">http://br.linkedin.com/in/fabriziomello</a><br /> >> Twitter:<a href="http://twitter.com/fabriziomello">http://twitter.com/fabriziomello</a></div></div>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/11/2013 09:59 PM, Rafael Martinez wrote: > * We need a pg_dump solution that can generate in one step all the > necessary pieces of information needed when restoring or cloning a > database. (schema, data, privileges, users and alter database/role > data) ... and if some users/roles already exist, but have different meanings? Or some roles exist and some don't? I'm very strongly against adding and using CREATE ROLE IF NOT EXISTS. pg_restore should handle this case-by-case, forcing the user to specify explicitly role-by-role that they want a given role in the existing DB re-used if it exists, or want a new one created with a new name in case of a clash. A --rename-all-conflicting-roles and --reuse-all-conflicting-roles option could be added for the all-or-none options. IMO if neither is specified, the existence of any role name conflict should be a restore error. > * It would be great to be able to tell pg_restore that user1 in > the dump will became user2 in the restored/cloned database. The > same for the name of the database. Agreed that this would be useful. Needs to deal with the case where the users should be separated but they should remain a member of some common role, though - eg "olduser" becomes "newuser" but the dumped "olduser" was member of role "users" and "newuser" should also be member of "users", not some renamed role. - -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.15 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJSgcfKAAoJELBXNkqjr+S2XWIH/2c1Mcd4ldVTAPw/jAln4gNM YH8SRPlsGU0fqfbYoKg/1y0K/Wdjdlac9bjjGzYpODryaXGopf1i+pWaphF2kJTM LeMRVgEFEW7u2Dr6FXajQTQCiXLnA8C16NmmgIdqZZgYCsOwCorG+gFNfI8fZyft okCQpYcljGXzlc218DI6/o4OZBBSdLh8diTzF8+xywoXJZopdAwfHDPPpAvizPye rcUUkq1svArq78HakSuI8HoCy3ZHuiCf8mQEUPcLhFrwgh+bkrs29W7YAdD75gr4 yp32XeyOY5npXHaG9mHghs7anbUnwywJVEzpwKAf0SyPe7zunw8fdtx2NSF70no= =dusS -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/11/2013 11:20 PM, Josh Berkus wrote: > On 11/11/2013 06:24 AM, Stephen Frost wrote: >> * Rafael Martinez (r.m.guerrero@usit.uio.no) wrote: >>> * We need a pg_dump solution that can generate in one step all >>> the necessary pieces of information needed when restoring or >>> cloning a database. (schema, data, privileges, users and alter >>> database/role data) >> [.........] > >> As for 'what we need', I'd think someone with the time and energy >> to write the patch and work with the community to implement it.. > > +1 > Well, I am willing to take a chance on the first suggestion if nobody else has the time or energy. I have never sent a patch or have worked with the postgres code, but I think it can be done without a lot of work with some reuse of the code used in pg_dumpall. This is a proposal based on the feedback we have received: * pg_dump will also deliver information about "ALTER DATABASE ... SET" data for a given database when the option '--create' is used. * pg_dump will deliver information about ROLES used and "ALTER ROLE ... SET" data for a given database when a new option i.e. "--roles-global" is used. * pg_restore will restore ROLE information when used with a new option i.e. "--roles-global" and "ALTER DATABASE ... SET" information when used with the '--create' option. * We need to do something with how pg_restore will handle ROLES information because some security concerns when restoring roles that already exists on the target server. Some of the suggestions are: a) Implement and use CREATE ROLE IF NOT EXISTS and just throw a warning or handle the "role already exists" error message gracefully. b) Use a new option i.e. "--reuse-roles-in-conflict" to behave like suggestion a). If this option is not used, pg_restore will stop with a fatal error when a role already exist. c) Use a new option i.e. "--on-role-error-stop" to stop with a fatal error when a role already exist. If this option is not used pg_restore will behave like suggestion a). d) Use a new option i.e. "--rename-roles-in-conflict" to rename the roles that already exists. If this option is not used, pg_restore will stop with a fatal error when a role already exist. I think I prefer b) to continue with the postgres spirit of security by default. d) is too complicated for me due to lack of knowledge of the postgres code. Comments? regards, - -- Rafael Martinez GuerreroCenter for Information TechnologyUniversity of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.14 (GNU/Linux) iEYEARECAAYFAlKB8w0ACgkQBhuKQurGihSq7QCfZdzreRGNRx0vUzXPjYqzNOIP LqgAoJnfeCYjsfEUmsYvvp3DSL959IRL =8Ynv -----END PGP SIGNATURE-----
* Rafael Martinez (r.m.guerrero@usit.uio.no) wrote: > Comments? Create a wiki page for it. :) Thanks, Stephen
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 11/12/2013 03:28 PM, Stephen Frost wrote: > * Rafael Martinez (r.m.guerrero@usit.uio.no) wrote: >> Comments? > > Create a wiki page for it. :) > What about this to start with?: https://wiki.postgresql.org/wiki/Pg_dump_improvements - -- Rafael Martinez Guerrero Center for Information Technology University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlKCTfAACgkQBhuKQurGihSrYQCeKyYVthpbk47hGjayBjidqaFL nysAn3JJjGT/8SuDUi2Nt6hEZ4eu1smz =7wjV -----END PGP SIGNATURE-----