Thread: pg_dump bug fixing

pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
Hi everyone,

I've decided to attempt to nail all known bugs in pg_dump for 7.5 :)

So, please send me ALL your known bugs/issues with pg_dump, pg_dumpall 
and pg_restore.  Note that I am NOT interested in feature requests, ONLY 
bugs.

A bug is considered to be an issue in pg_dump that means that when a 
legally arrived at state in your PostgreSQL database, running pg_dump 
and then restoring that dump does not result in an identical state. 
Legally means 'obtained without manual catalog hacking' and 'identical' 
means except for object OIDs.

The current list of known issues (for which I haven't submitted a fix 
yet) that I have are as follows:

* Circular view dependencies (a pretty minor/rare issue...can only be 
"fixed" by banning it in the backend)

* Alter object owner, privileges get a bit messed up.  This is really a 
backend bug, but there might be a pg_dump workaround for it.

* Tablespace that primary key and unique constraint indexes are in are 
not dumped

* If you drop your public schema, a drop command is not issued for it in 
the dump, so when you restore your public schema is back

Does anyone have any others?

Chris



Re: pg_dump bug fixing

From
Bruno Wolff III
Date:
On Sun, Jul 18, 2004 at 14:33:09 +0800, Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote:
> 
> * If you drop your public schema, a drop command is not issued for it in 
> the dump, so when you restore your public schema is back

I am not sure that is really a bug. If someone really wants less than
what is in template1, they should be dropping stuff from template1
before recreating the database.


Re: pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
> I am not sure that is really a bug. If someone really wants less than
> what is in template1, they should be dropping stuff from template1
> before recreating the database.

No, because pg_dump itself dumps template1's contents...

Chris



Re: pg_dump bug fixing

From
Bruno Wolff III
Date:
On Sun, Jul 18, 2004 at 19:42:09 +0800, Christopher Kings-Lynne <chriskl@familyhealth.com.au> wrote:
> >I am not sure that is really a bug. If someone really wants less than
> >what is in template1, they should be dropping stuff from template1
> >before recreating the database.
> 
> No, because pg_dump itself dumps template1's contents...

Are you planning on extending this line of thought to other things in
template1 such as operators?


Re: pg_dump bug fixing

From
Rod Taylor
Date:
On Sun, 2004-07-18 at 07:42, Christopher Kings-Lynne wrote:
> > I am not sure that is really a bug. If someone really wants less than
> > what is in template1, they should be dropping stuff from template1
> > before recreating the database.
> 
> No, because pg_dump itself dumps template1's contents...

Hmm.

1. Add language handler (say plpgsql) to template1.
2. Create new database
3. Dump database
4. Restore dump, having pg_dump create a new database

Doh.. errors because language handlers are there twice.

You can play this game with tables, sequences, and all sorts of other
things that might be useful in the default template. pg_dump just skips
it (not necessarily bad unless they did an alter table afterward) but
pg_restore doesn't like this at all.


I think what we want is a clean template without all of the extras that
template1 has.

If we dump & restore the public schema and other items that we might be
interested in having, we can have pg_dump use template_clean as a
template for new databases. Template_clean contains none of the things
that pg_dump can dump/restore (like language handlers) but is NOT the
default template.

I've done this myself, removed lots of stuff from template1 after
creating a template_<companyname> that is used for creating new DBs
with. This works great when we remember to specify WITH TEMPLATE.




Re: pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
>>No, because pg_dump itself dumps template1's contents...
> 
> Are you planning on extending this line of thought to other things in
> template1 such as operators?

Ah, I see where you are going with this....

I think that we should treat the public schema specially :)  That's 
because it is much more likely to be dropped and messed with than the 
other system objects, in fact you're encouraged to drop it in the docs.

Chris



Re: pg_dump bug fixing

From
Tom Lane
Date:
Rod Taylor <pg@rbt.ca> writes:
> I think what we want is a clean template without all of the extras that
> template1 has.

Sounds like a job for ... template0 !

Seriously, this thread would be more convincing if anyone in it betrayed
any knowledge that pg_dump wants you to start from template0 rather than
template1.
        regards, tom lane


Re: pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
> Sounds like a job for ... template0 !
> 
> Seriously, this thread would be more convincing if anyone in it betrayed
> any knowledge that pg_dump wants you to start from template0 rather than
> template1.

What if we made it so that template1 is always restored last?  Won't 
that be an improvement?

Chris



Re: pg_dump bug fixing

From
Rod Taylor
Date:
On Sun, 2004-07-18 at 23:55, Tom Lane wrote:
> Rod Taylor <pg@rbt.ca> writes:
> > I think what we want is a clean template without all of the extras that
> > template1 has.
> 
> Sounds like a job for ... template0 !

It doesn't quite work in my case as I've removed items included in
template0 (public schema).



Re: pg_dump bug fixing

From
Tom Lane
Date:
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> I think that we should treat the public schema specially :)

We already are to some extent, since pg_dump will dump its comment and
privileges, which it would not do for any other predefined object.
I think this is actually an implementation artifact rather than
something that was explicitly intended at the time, but since no one
has complained about it, it's probably a good thing ;-)

Also, if we're going to apply Fabien's proposed patch to alter the
ownership of the public schema, that's still another way in which the
public schema becomes less like a system-defined object ...
        regards, tom lane


Re: pg_dump bug fixing

From
Josh Berkus
Date:
Rod,

> I think what we want is a clean template without all of the extras that
> template1 has.

We have this, it's called Template0.

Actually, KL, that would solve a lot of these duplicate object problems.  What 
if pg_restore used Template0 and not Template1?    It wouldn't fix the "drop 
public schema" issue but it would solve the others.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: pg_dump bug fixing

From
Rod Taylor
Date:
On Mon, 2004-07-19 at 12:36, Josh Berkus wrote:
> Rod,
> 
> > I think what we want is a clean template without all of the extras that
> > template1 has.
> 
> We have this, it's called Template0.

Doesn't work for me. I remove a number of things that are included by
default in template0, but yes, it's close.

> Actually, KL, that would solve a lot of these duplicate object problems.  What 
> if pg_restore used Template0 and not Template1?    It wouldn't fix the "drop 
> public schema" issue but it would solve the others.

Remove the public schema from template0, but leave it in template1. Have
pg_dump treat the public schema the same as all of the other ones.




Re: pg_dump bug fixing

From
Josh Berkus
Date:
Rod,

> Remove the public schema from template0, but leave it in template1. Have
> pg_dump treat the public schema the same as all of the other ones.

Hmmm.  No good; it wipes out the primary purpose of Template0, which is to 
restore a corrupted Template1.

-- 
-Josh BerkusAglio Database SolutionsSan Francisco



Re: pg_dump bug fixing

From
Tom Lane
Date:
Rod Taylor <pg@rbt.ca> writes:
> Remove the public schema from template0, but leave it in template1.

Does not sound very workable.  One of the functions of template0 is to
be a backup for the virgin state of template1, and you'd lose that.
        regards, tom lane


Re: pg_dump bug fixing

From
Rod Taylor
Date:
On Mon, 2004-07-19 at 13:30, Josh Berkus wrote:
> Rod,
> 
> > Remove the public schema from template0, but leave it in template1. Have
> > pg_dump treat the public schema the same as all of the other ones.
> 
> Hmmm.  No good; it wipes out the primary purpose of Template0, which is to 
> restore a corrupted Template1.

Yeah.. This is why my first suggestion was to create a new template for
this purpose -- of which most responses told me about template0.



Re: pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
> We already are to some extent, since pg_dump will dump its comment and
> privileges, which it would not do for any other predefined object.
> I think this is actually an implementation artifact rather than
> something that was explicitly intended at the time, but since no one
> has complained about it, it's probably a good thing ;-)

Does that mean your in favour of dumping a DROP SCHEMA public; command 
if they have dropped their public schema?  It's definitely not worth 
doing it for any other "system" object due to upwards compatibility of 
the dump files...

Chris



Re: pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
> Actually, KL, that would solve a lot of these duplicate object problems.  What 
> if pg_restore used Template0 and not Template1?    It wouldn't fix the "drop 
> public schema" issue but it would solve the others.

Not sure what you mean here, but CVS pg_dump dumps like this:

CREATE DATABASE phppgadmin WITH TEMPLATE = template0 OWNER = chriskl 
ENCODING = 'LATIN1';
CREATE DATABASE usa WITH TEMPLATE = template0 OWNER = usadmin ENCODING = 
'LATIN1';

Chris



Re: pg_dump bug fixing

From
Rod Taylor
Date:
On Mon, 2004-07-19 at 21:20, Christopher Kings-Lynne wrote:
> > We already are to some extent, since pg_dump will dump its comment and
> > privileges, which it would not do for any other predefined object.
> > I think this is actually an implementation artifact rather than
> > something that was explicitly intended at the time, but since no one
> > has complained about it, it's probably a good thing ;-)
> 
> Does that mean your in favour of dumping a DROP SCHEMA public; command 
> if they have dropped their public schema?  It's definitely not worth 
> doing it for any other "system" object due to upwards compatibility of 
> the dump files...

Please don't. It would be rather surprising to have stuff disappear from
a database while doing a restore -- especially if it's a CASCADE
operation.

Creating the public schema when needed and leaving it out by default is
preferable.




Re: pg_dump bug fixing

From
Harald Fuchs
Date:
In article <1090256502.414.17.camel@jester>,
Rod Taylor <pg@rbt.ca> writes:

> On Mon, 2004-07-19 at 12:36, Josh Berkus wrote:
>> Rod,
>> 
>> > I think what we want is a clean template without all of the extras that
>> > template1 has.
>> 
>> We have this, it's called Template0.

> Doesn't work for me. I remove a number of things that are included by
> default in template0, but yes, it's close.

I think pg_dump should do a kind of "diff" between template1 and the
database in question and include the necessary statements in the dump
to allow pg_restore to "replay" the diff after it has created the
database from template1.



Re: pg_dump bug fixing

From
Josh Berkus
Date:
KL,

> CREATE DATABASE phppgadmin WITH TEMPLATE = template0 OWNER = chriskl
> ENCODING = 'LATIN1';
> CREATE DATABASE usa WITH TEMPLATE = template0 OWNER = usadmin ENCODING =
> 'LATIN1';

Ok, so that would come under the heading of "already fixed".  Great.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: pg_dump bug fixing

From
Josh Berkus
Date:
Chris,

I've just found a bit of undesirable functionality which I would call a bug in 
pg_dump.   I'm not sure everyone would, but we'll see.

Problem:  the script which dumps globals such as users (pg_dumpall -g) 
involves deleting *all* users from the pg_shadow table via a direct update to 
that table.   What this means in effect is that, should you attempt to use 
"pg_dumpall -g" to *move* a set of users from one active server to another 
(such as for transferring a database) the resulting pg_dump file will delete 
all of the users which previously existed on that server.

This is a non-trivial accident to have happen on a shared machine; once users 
are dumped, all of their ownerships and permissions go with them.   If you 
have a complex permissions system, better hope you backed up first!

I find this behavior highly undesirable, and consider it a bug.    The globals 
dump should just add users, and not delete any.

-- 
-Josh BerkusAglio Database SolutionsSan Francisco



Re: pg_dump bug fixing

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Problem:  the script which dumps globals such as users (pg_dumpall -g) 
> involves deleting *all* users from the pg_shadow table via a direct update to
> that table.

AFAICS that happens only if you've specified the -c (--clean) option.
Hence, I don't think it's a bug.
        regards, tom lane


Re: pg_dump bug fixing

From
"Joshua D. Drake"
Date:
>
> This is a non-trivial accident to have happen on a shared machine; once users
> are dumped, all of their ownerships and permissions go with them.   If you
> have a complex permissions system, better hope you backed up first!
>
> I find this behavior highly undesirable, and consider it a bug.    The globals
> dump should just add users, and not delete any.

Unless the --clean option is passed, yes I agree with you. The other
issue is that it is silly to have to use pg_dumpall to get the globals.
A person should be able to pull a pg_dump on a particular database and
get everything that is required to run that database. Including users.

Joshua D. Drake


>


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

Re: pg_dump bug fixing

From
Josh Berkus
Date:
Tom,

> AFAICS that happens only if you've specified the -c (--clean) option.
> Hence, I don't think it's a bug.

Nope, happens even if you don't pass --clean.

-- 
-Josh BerkusAglio Database SolutionsSan Francisco



Re: pg_dump bug fixing

From
"Marc G. Fournier"
Date:
On Mon, 2 Aug 2004, Tom Lane wrote:

> Josh Berkus <josh@agliodbs.com> writes:
>> Problem:  the script which dumps globals such as users (pg_dumpall -g)
>> involves deleting *all* users from the pg_shadow table via a direct update to
>> that table.
>
> AFAICS that happens only if you've specified the -c (--clean) option.
> Hence, I don't think it's a bug.

Is --clean a new option?  Cause I've had the same thing happen to us also 
... thank god for backups :)

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664


Re: pg_dump bug fixing

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
>> AFAICS that happens only if you've specified the -c (--clean) option.
>> Hence, I don't think it's a bug.

> Nope, happens even if you don't pass --clean.

Not in CVS tip ... but you're right, older versions did act that way.
Looks like someone addressed this already.
        regards, tom lane


Re: pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
> I've just found a bit of undesirable functionality which I would call a bug in 
> pg_dump.   I'm not sure everyone would, but we'll see.
> 
> Problem:  the script which dumps globals such as users (pg_dumpall -g) 
> involves deleting *all* users from the pg_shadow table via a direct update to 
> that table.   What this means in effect is that, should you attempt to use 
> "pg_dumpall -g" to *move* a set of users from one active server to another 
> (such as for transferring a database) the resulting pg_dump file will delete 
> all of the users which previously existed on that server.
> 
> This is a non-trivial accident to have happen on a shared machine; once users 
> are dumped, all of their ownerships and permissions go with them.   If you 
> have a complex permissions system, better hope you backed up first!
> 
> I find this behavior highly undesirable, and consider it a bug.    The globals 
> dump should just add users, and not delete any.

Yeah, it's nasty.  One of the fixes that's already in from me is to make 
the DELETE FROM pg_shadow and DELETE FROM pg_group only appear when -c 
mode is set.

Maybe even when -c mode is set we should use DROP USER commands?

Do others agree?

Chris



Re: pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
>>>AFAICS that happens only if you've specified the -c (--clean) option.
>>>Hence, I don't think it's a bug.
> 
> 
>>Nope, happens even if you don't pass --clean.
> 
> 
> Not in CVS tip ... but you're right, older versions did act that way.
> Looks like someone addressed this already.

Yeah, was one of my fixes.

Should we change it to use DROP USER commands anyway?

Chris



Re: pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
>> I find this behavior highly undesirable, and consider it a bug.    The 
>> globals dump should just add users, and not delete any.
> 
> 
> Unless the --clean option is passed, yes I agree with you. The other 
> issue is that it is silly to have to use pg_dumpall to get the globals. 
> A person should be able to pull a pg_dump on a particular database and 
> get everything that is required to run that database. Including users.

Another reason to combine pg_dumpall into pg_dump...

Chris



Re: pg_dump bug fixing

From
Josh Berkus
Date:
Chris,

> Another reason to combine pg_dumpall into pg_dump...

No argument here.   Are you thinking of that?

-- 
-Josh BerkusAglio Database SolutionsSan Francisco



Re: pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
>>Another reason to combine pg_dumpall into pg_dump...
> 
> No argument here.   Are you thinking of that?

Yeah.  Would be a bit of work though.

Chris



Re: pg_dump bug fixing

From
Philip Warner
Date:
At 01:17 PM 3/08/2004, Christopher Kings-Lynne wrote:
>Would be a bit of work though.

I've been looking at this for a while now, and will probably give it a go 
for 7.6/8.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp.mit.edu:11371       |/ 



Re: pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
> I've been looking at this for a while now, and will probably give it a 
> go for 7.6/8.

Let me know when you do, I'd be interested in collaborating.

Chris



Re: pg_dump bug fixing

From
Philip Warner
Date:
At 02:00 PM 3/08/2004, Christopher Kings-Lynne wrote:
>I'd be interested in collaborating.

Sounds good.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 03 5330 3172          |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp.mit.edu:11371       |/ 



Re: pg_dump bug fixing

From
"Joshua D. Drake"
Date:
Christopher Kings-Lynne wrote:

>> I've been looking at this for a while now, and will probably give it 
>> a go for 7.6/8.
>
>
> Let me know when you do, I'd be interested in collaborating.
>
Command Prompt, if would help could help sponsor this project.

Sincerely,

Joshua D. Drake



> Chris



-- 
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL



Re: pg_dump bug fixing

From
Josh Berkus
Date:
Chris,

> I was thinking a pg_export utility that can output to a range of other
> databases SQL formats would also be a good idea.  It would share about
> 90% of the pg_dump code, but I'm trying to think of how to avoid
> duplicating the code.

I'm not really keen on this idea unless you're eager to make a 5-year 
commitment to maintain the code.   The load formats of other RDBMSes change 
all the time -- MySQL is a particularly egregious example, with 2 
incompatible changes in the last year -- and it would become a pain to keep 
track.   

More to the point, there are a number of 3rd-party OSS and proprietary 
utilities which can do this kind of format conversion.   For example, Perl 
DB::Interpolator will cover PG, MySQL, Oracle and MSSQL once that 
functionality is out of beta.

I can see, though, having a --strict-sql switch for pg_dump which would dump 
all database objects in strict SQL92, which should be pretty compatible with 
other systems.   This should also be easier to implement and trivial to 
maintain.  Though it would mean not dumping functions and doing a few data 
type conversions.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco


Re: pg_dump bug fixing

From
Christopher Kings-Lynne
Date:
> I'm not really keen on this idea unless you're eager to make a 5-year 
> commitment to maintain the code.   The load formats of other RDBMSes change 
> all the time -- MySQL is a particularly egregious example, with 2 
> incompatible changes in the last year -- and it would become a pain to keep 
> track.   

Well, I could do it on pgfoundry, but it would really suck to have to 
dupe all the pg_dump code.  Maybe I will have to.

> More to the point, there are a number of 3rd-party OSS and proprietary 
> utilities which can do this kind of format conversion.   For example, Perl 
> DB::Interpolator will cover PG, MySQL, Oracle and MSSQL once that 
> functionality is out of beta.

Do they convert the sql dumps or dump from the backend?  I really, 
really want to make a mysql2pgsql converter that doesn't really on text 
file parsing.  Modifying mysqldump would be easiest, but the problem is 
licensing I think...

> I can see, though, having a --strict-sql switch for pg_dump which would dump 
> all database objects in strict SQL92, which should be pretty compatible with 
> other systems.   This should also be easier to implement and trivial to 
> maintain.  Though it would mean not dumping functions and doing a few data 
> type conversions.

Yeah, perhaps.  And issuing a log of warnings so you can see what 
information you've lost.

Chris