Thread: pg_dump bug fixing
Hi everyone, I've decided to attempt to nail all known bugs in pg_dump for 7.5 :) So, please send me ALL your known bugs/issues with pg_dump, pg_dumpall and pg_restore. Note that I am NOT interested in feature requests, ONLY bugs. A bug is considered to be an issue in pg_dump that means that when a legally arrived at state in your PostgreSQL database, running pg_dump and then restoring that dump does not result in an identical state. Legally means 'obtained without manual catalog hacking' and 'identical' means except for object OIDs. The current list of known issues (for which I haven't submitted a fix yet) that I have are as follows: * Circular view dependencies (a pretty minor/rare issue...can only be "fixed" by banning it in the backend) * Alter object owner, privileges get a bit messed up. This is really a backend bug, but there might be a pg_dump workaround for it. * Tablespace that primary key and unique constraint indexes are in are not dumped * If you drop your public schema, a drop command is not issued for it in the dump, so when you restore your public schema is back Does anyone have any others? Chris
On Sun, Jul 18, 2004 at 14:33:09 +0800, Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote: > > * If you drop your public schema, a drop command is not issued for it in > the dump, so when you restore your public schema is back I am not sure that is really a bug. If someone really wants less than what is in template1, they should be dropping stuff from template1 before recreating the database.
> I am not sure that is really a bug. If someone really wants less than > what is in template1, they should be dropping stuff from template1 > before recreating the database. No, because pg_dump itself dumps template1's contents... Chris
On Sun, Jul 18, 2004 at 19:42:09 +0800, Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote: > >I am not sure that is really a bug. If someone really wants less than > >what is in template1, they should be dropping stuff from template1 > >before recreating the database. > > No, because pg_dump itself dumps template1's contents... Are you planning on extending this line of thought to other things in template1 such as operators?
On Sun, 2004-07-18 at 07:42, Christopher Kings-Lynne wrote: > > I am not sure that is really a bug. If someone really wants less than > > what is in template1, they should be dropping stuff from template1 > > before recreating the database. > > No, because pg_dump itself dumps template1's contents... Hmm. 1. Add language handler (say plpgsql) to template1. 2. Create new database 3. Dump database 4. Restore dump, having pg_dump create a new database Doh.. errors because language handlers are there twice. You can play this game with tables, sequences, and all sorts of other things that might be useful in the default template. pg_dump just skips it (not necessarily bad unless they did an alter table afterward) but pg_restore doesn't like this at all. I think what we want is a clean template without all of the extras that template1 has. If we dump & restore the public schema and other items that we might be interested in having, we can have pg_dump use template_clean as a template for new databases. Template_clean contains none of the things that pg_dump can dump/restore (like language handlers) but is NOT the default template. I've done this myself, removed lots of stuff from template1 after creating a template_<companyname> that is used for creating new DBs with. This works great when we remember to specify WITH TEMPLATE.
>>No, because pg_dump itself dumps template1's contents... > > Are you planning on extending this line of thought to other things in > template1 such as operators? Ah, I see where you are going with this.... I think that we should treat the public schema specially :) That's because it is much more likely to be dropped and messed with than the other system objects, in fact you're encouraged to drop it in the docs. Chris
Rod Taylor <pg@rbt.ca> writes: > I think what we want is a clean template without all of the extras that > template1 has. Sounds like a job for ... template0 ! Seriously, this thread would be more convincing if anyone in it betrayed any knowledge that pg_dump wants you to start from template0 rather than template1. regards, tom lane
> Sounds like a job for ... template0 ! > > Seriously, this thread would be more convincing if anyone in it betrayed > any knowledge that pg_dump wants you to start from template0 rather than > template1. What if we made it so that template1 is always restored last? Won't that be an improvement? Chris
On Sun, 2004-07-18 at 23:55, Tom Lane wrote: > Rod Taylor <pg@rbt.ca> writes: > > I think what we want is a clean template without all of the extras that > > template1 has. > > Sounds like a job for ... template0 ! It doesn't quite work in my case as I've removed items included in template0 (public schema).
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes: > I think that we should treat the public schema specially :) We already are to some extent, since pg_dump will dump its comment and privileges, which it would not do for any other predefined object. I think this is actually an implementation artifact rather than something that was explicitly intended at the time, but since no one has complained about it, it's probably a good thing ;-) Also, if we're going to apply Fabien's proposed patch to alter the ownership of the public schema, that's still another way in which the public schema becomes less like a system-defined object ... regards, tom lane
Rod, > I think what we want is a clean template without all of the extras that > template1 has. We have this, it's called Template0. Actually, KL, that would solve a lot of these duplicate object problems. What if pg_restore used Template0 and not Template1? It wouldn't fix the "drop public schema" issue but it would solve the others. -- Josh Berkus Aglio Database Solutions San Francisco
On Mon, 2004-07-19 at 12:36, Josh Berkus wrote: > Rod, > > > I think what we want is a clean template without all of the extras that > > template1 has. > > We have this, it's called Template0. Doesn't work for me. I remove a number of things that are included by default in template0, but yes, it's close. > Actually, KL, that would solve a lot of these duplicate object problems. What > if pg_restore used Template0 and not Template1? It wouldn't fix the "drop > public schema" issue but it would solve the others. Remove the public schema from template0, but leave it in template1. Have pg_dump treat the public schema the same as all of the other ones.
Rod, > Remove the public schema from template0, but leave it in template1. Have > pg_dump treat the public schema the same as all of the other ones. Hmmm. No good; it wipes out the primary purpose of Template0, which is to restore a corrupted Template1. -- -Josh BerkusAglio Database SolutionsSan Francisco
Rod Taylor <pg@rbt.ca> writes: > Remove the public schema from template0, but leave it in template1. Does not sound very workable. One of the functions of template0 is to be a backup for the virgin state of template1, and you'd lose that. regards, tom lane
On Mon, 2004-07-19 at 13:30, Josh Berkus wrote: > Rod, > > > Remove the public schema from template0, but leave it in template1. Have > > pg_dump treat the public schema the same as all of the other ones. > > Hmmm. No good; it wipes out the primary purpose of Template0, which is to > restore a corrupted Template1. Yeah.. This is why my first suggestion was to create a new template for this purpose -- of which most responses told me about template0.
> We already are to some extent, since pg_dump will dump its comment and > privileges, which it would not do for any other predefined object. > I think this is actually an implementation artifact rather than > something that was explicitly intended at the time, but since no one > has complained about it, it's probably a good thing ;-) Does that mean your in favour of dumping a DROP SCHEMA public; command if they have dropped their public schema? It's definitely not worth doing it for any other "system" object due to upwards compatibility of the dump files... Chris
> Actually, KL, that would solve a lot of these duplicate object problems. What > if pg_restore used Template0 and not Template1? It wouldn't fix the "drop > public schema" issue but it would solve the others. Not sure what you mean here, but CVS pg_dump dumps like this: CREATE DATABASE phppgadmin WITH TEMPLATE = template0 OWNER = chriskl ENCODING = 'LATIN1'; CREATE DATABASE usa WITH TEMPLATE = template0 OWNER = usadmin ENCODING = 'LATIN1'; Chris
On Mon, 2004-07-19 at 21:20, Christopher Kings-Lynne wrote: > > We already are to some extent, since pg_dump will dump its comment and > > privileges, which it would not do for any other predefined object. > > I think this is actually an implementation artifact rather than > > something that was explicitly intended at the time, but since no one > > has complained about it, it's probably a good thing ;-) > > Does that mean your in favour of dumping a DROP SCHEMA public; command > if they have dropped their public schema? It's definitely not worth > doing it for any other "system" object due to upwards compatibility of > the dump files... Please don't. It would be rather surprising to have stuff disappear from a database while doing a restore -- especially if it's a CASCADE operation. Creating the public schema when needed and leaving it out by default is preferable.
In article <1090256502.414.17.camel@jester>, Rod Taylor <pg@rbt.ca> writes: > On Mon, 2004-07-19 at 12:36, Josh Berkus wrote: >> Rod, >> >> > I think what we want is a clean template without all of the extras that >> > template1 has. >> >> We have this, it's called Template0. > Doesn't work for me. I remove a number of things that are included by > default in template0, but yes, it's close. I think pg_dump should do a kind of "diff" between template1 and the database in question and include the necessary statements in the dump to allow pg_restore to "replay" the diff after it has created the database from template1.
KL, > CREATE DATABASE phppgadmin WITH TEMPLATE = template0 OWNER = chriskl > ENCODING = 'LATIN1'; > CREATE DATABASE usa WITH TEMPLATE = template0 OWNER = usadmin ENCODING = > 'LATIN1'; Ok, so that would come under the heading of "already fixed". Great. -- Josh Berkus Aglio Database Solutions San Francisco
Chris, I've just found a bit of undesirable functionality which I would call a bug in pg_dump. I'm not sure everyone would, but we'll see. Problem: the script which dumps globals such as users (pg_dumpall -g) involves deleting *all* users from the pg_shadow table via a direct update to that table. What this means in effect is that, should you attempt to use "pg_dumpall -g" to *move* a set of users from one active server to another (such as for transferring a database) the resulting pg_dump file will delete all of the users which previously existed on that server. This is a non-trivial accident to have happen on a shared machine; once users are dumped, all of their ownerships and permissions go with them. If you have a complex permissions system, better hope you backed up first! I find this behavior highly undesirable, and consider it a bug. The globals dump should just add users, and not delete any. -- -Josh BerkusAglio Database SolutionsSan Francisco
Josh Berkus <josh@agliodbs.com> writes: > Problem: the script which dumps globals such as users (pg_dumpall -g) > involves deleting *all* users from the pg_shadow table via a direct update to > that table. AFAICS that happens only if you've specified the -c (--clean) option. Hence, I don't think it's a bug. regards, tom lane
> > This is a non-trivial accident to have happen on a shared machine; once users > are dumped, all of their ownerships and permissions go with them. If you > have a complex permissions system, better hope you backed up first! > > I find this behavior highly undesirable, and consider it a bug. The globals > dump should just add users, and not delete any. Unless the --clean option is passed, yes I agree with you. The other issue is that it is silly to have to use pg_dumpall to get the globals. A person should be able to pull a pg_dump on a particular database and get everything that is required to run that database. Including users. Joshua D. Drake > -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL
Attachment
Tom, > AFAICS that happens only if you've specified the -c (--clean) option. > Hence, I don't think it's a bug. Nope, happens even if you don't pass --clean. -- -Josh BerkusAglio Database SolutionsSan Francisco
On Mon, 2 Aug 2004, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> Problem: the script which dumps globals such as users (pg_dumpall -g) >> involves deleting *all* users from the pg_shadow table via a direct update to >> that table. > > AFAICS that happens only if you've specified the -c (--clean) option. > Hence, I don't think it's a bug. Is --clean a new option? Cause I've had the same thing happen to us also ... thank god for backups :) ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Josh Berkus <josh@agliodbs.com> writes: >> AFAICS that happens only if you've specified the -c (--clean) option. >> Hence, I don't think it's a bug. > Nope, happens even if you don't pass --clean. Not in CVS tip ... but you're right, older versions did act that way. Looks like someone addressed this already. regards, tom lane
> I've just found a bit of undesirable functionality which I would call a bug in > pg_dump. I'm not sure everyone would, but we'll see. > > Problem: the script which dumps globals such as users (pg_dumpall -g) > involves deleting *all* users from the pg_shadow table via a direct update to > that table. What this means in effect is that, should you attempt to use > "pg_dumpall -g" to *move* a set of users from one active server to another > (such as for transferring a database) the resulting pg_dump file will delete > all of the users which previously existed on that server. > > This is a non-trivial accident to have happen on a shared machine; once users > are dumped, all of their ownerships and permissions go with them. If you > have a complex permissions system, better hope you backed up first! > > I find this behavior highly undesirable, and consider it a bug. The globals > dump should just add users, and not delete any. Yeah, it's nasty. One of the fixes that's already in from me is to make the DELETE FROM pg_shadow and DELETE FROM pg_group only appear when -c mode is set. Maybe even when -c mode is set we should use DROP USER commands? Do others agree? Chris
>>>AFAICS that happens only if you've specified the -c (--clean) option. >>>Hence, I don't think it's a bug. > > >>Nope, happens even if you don't pass --clean. > > > Not in CVS tip ... but you're right, older versions did act that way. > Looks like someone addressed this already. Yeah, was one of my fixes. Should we change it to use DROP USER commands anyway? Chris
>> I find this behavior highly undesirable, and consider it a bug. The >> globals dump should just add users, and not delete any. > > > Unless the --clean option is passed, yes I agree with you. The other > issue is that it is silly to have to use pg_dumpall to get the globals. > A person should be able to pull a pg_dump on a particular database and > get everything that is required to run that database. Including users. Another reason to combine pg_dumpall into pg_dump... Chris
Chris, > Another reason to combine pg_dumpall into pg_dump... No argument here. Are you thinking of that? -- -Josh BerkusAglio Database SolutionsSan Francisco
>>Another reason to combine pg_dumpall into pg_dump... > > No argument here. Are you thinking of that? Yeah. Would be a bit of work though. Chris
At 01:17 PM 3/08/2004, Christopher Kings-Lynne wrote: >Would be a bit of work though. I've been looking at this for a while now, and will probably give it a go for 7.6/8. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
> I've been looking at this for a while now, and will probably give it a > go for 7.6/8. Let me know when you do, I'd be interested in collaborating. Chris
At 02:00 PM 3/08/2004, Christopher Kings-Lynne wrote: >I'd be interested in collaborating. Sounds good. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Christopher Kings-Lynne wrote: >> I've been looking at this for a while now, and will probably give it >> a go for 7.6/8. > > > Let me know when you do, I'd be interested in collaborating. > Command Prompt, if would help could help sponsor this project. Sincerely, Joshua D. Drake > Chris -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Chris, > I was thinking a pg_export utility that can output to a range of other > databases SQL formats would also be a good idea. It would share about > 90% of the pg_dump code, but I'm trying to think of how to avoid > duplicating the code. I'm not really keen on this idea unless you're eager to make a 5-year commitment to maintain the code. The load formats of other RDBMSes change all the time -- MySQL is a particularly egregious example, with 2 incompatible changes in the last year -- and it would become a pain to keep track. More to the point, there are a number of 3rd-party OSS and proprietary utilities which can do this kind of format conversion. For example, Perl DB::Interpolator will cover PG, MySQL, Oracle and MSSQL once that functionality is out of beta. I can see, though, having a --strict-sql switch for pg_dump which would dump all database objects in strict SQL92, which should be pretty compatible with other systems. This should also be easier to implement and trivial to maintain. Though it would mean not dumping functions and doing a few data type conversions. -- Josh Berkus Aglio Database Solutions San Francisco
> I'm not really keen on this idea unless you're eager to make a 5-year > commitment to maintain the code. The load formats of other RDBMSes change > all the time -- MySQL is a particularly egregious example, with 2 > incompatible changes in the last year -- and it would become a pain to keep > track. Well, I could do it on pgfoundry, but it would really suck to have to dupe all the pg_dump code. Maybe I will have to. > More to the point, there are a number of 3rd-party OSS and proprietary > utilities which can do this kind of format conversion. For example, Perl > DB::Interpolator will cover PG, MySQL, Oracle and MSSQL once that > functionality is out of beta. Do they convert the sql dumps or dump from the backend? I really, really want to make a mysql2pgsql converter that doesn't really on text file parsing. Modifying mysqldump would be easiest, but the problem is licensing I think... > I can see, though, having a --strict-sql switch for pg_dump which would dump > all database objects in strict SQL92, which should be pretty compatible with > other systems. This should also be easier to implement and trivial to > maintain. Though it would mean not dumping functions and doing a few data > type conversions. Yeah, perhaps. And issuing a log of warnings so you can see what information you've lost. Chris