Thread: Edge case problem with pg_dump

Edge case problem with pg_dump

From
"D'Arcy J.M. Cain"
Date:
There seems to be a pg_dump issue with inherited tables when columns
are added to the parent table after creating the child table.  Here is
how to repeat the problem.

Create a test database and run the following SQL statements:

create table a (x int);
create table b (y int) inherits (a);
alter table a add column z int;
insert into b values (1, 2, 3);
select * from b;

You should see this:

test1=# select * from b;x | y | z
---+---+---1 | 2 | 3(1 row)

Now create a second test database and dump the first into the second:

pg_dump test1 | psql test2

Now try that last select statement in the new database and you will see:

test2=# select * from b;x | z | y
---+---+---1 | 2 | 3(1 row)

If you are lucky the restore fails because of conflicting types.  Worse, as
in this example, the types are the same and it silently messes up your data
by reversing the names on two columns.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: Edge case problem with pg_dump

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> There seems to be a pg_dump issue with inherited tables when columns
> are added to the parent table after creating the child table.

It's always been there --- ever tried dumping and reloading the
regression database?

Right now the only safe way to dump such a database is to use the
inserts-with-explicit-column-names option.  Someone was working on
extending COPY to allow a column name list, and as soon as that gets
done I intend to change pg_dump to specify a column name list in
COPY commands.  That should fix this problem.
        regards, tom lane


Re: Edge case problem with pg_dump

From
"D'Arcy J.M. Cain"
Date:
On May 22, 2002 10:28 am, you wrote:
> Right now the only safe way to dump such a database is to use the
> inserts-with-explicit-column-names option.  Someone was working on
> extending COPY to allow a column name list, and as soon as that gets
> done I intend to change pg_dump to specify a column name list in
> COPY commands.  That should fix this problem.

Do you mean issue COPY commands with fields or COPY out the fields in a 
specific order by using the extension in pg_dump?  Seems like the latter 
would be cleaner but the former is probably a lot simpler to do.

What would the new syntax of the COPY look like?

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: Edge case problem with pg_dump

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> On May 22, 2002 10:28 am, you wrote:
>> Right now the only safe way to dump such a database is to use the
>> inserts-with-explicit-column-names option.  Someone was working on
>> extending COPY to allow a column name list, and as soon as that gets
>> done I intend to change pg_dump to specify a column name list in
>> COPY commands.  That should fix this problem.

> Do you mean issue COPY commands with fields or COPY out the fields in a 
> specific order by using the extension in pg_dump?

I intended that the dump scripts would say something like
COPY mytab(field1,field2,field3) FROM STDIN;

which would make it absolutely clear what the dump's field order is.
We can't solve it by reordering the fields while we dump, which is
what I think you mean by the other alternative: how is pg_dump to
guess what schema you are going to load the data into?  For example,
it should work to do a data-only dump and then reload into the existing
table structure.  So the dump script really needs to work for either
column ordering in the destination table, and that's why we need
explicit labeling of the field order in the script.

If we take this really seriously we might want to eliminate pg_dump's
-d (simple INSERT) option, and have only two dump formats: COPY with
field labels, or INSERT with field labels.
        regards, tom lane


Re: Edge case problem with pg_dump

From
"D'Arcy J.M. Cain"
Date:
* Tom Lane <tgl@sss.pgh.pa.us> [020523 10:24]:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > Do you mean issue COPY commands with fields or COPY out the fields in a 
> > specific order by using the extension in pg_dump?
> 
> I intended that the dump scripts would say something like
> 
>     COPY mytab(field1,field2,field3) FROM STDIN;

Cool.  I assume that the "(field1,field2,field3)" would be optional for
backwards compatibility.

> which would make it absolutely clear what the dump's field order is.
> We can't solve it by reordering the fields while we dump, which is
> what I think you mean by the other alternative: how is pg_dump to
> guess what schema you are going to load the data into?  For example,

Well, the issue now is that it creates the schema too but it is out of
sync with the data it spits out.  I can see how figuring it out is a lot
more difficult though.  The above works.

> it should work to do a data-only dump and then reload into the existing
> table structure.  So the dump script really needs to work for either
> column ordering in the destination table, and that's why we need
> explicit labeling of the field order in the script.

That's nice.  I have scripts that effectively do this in code now when
I have to dump from one schema and load into another.

> If we take this really seriously we might want to eliminate pg_dump's
> -d (simple INSERT) option, and have only two dump formats: COPY with
> field labels, or INSERT with field labels.

Yah, I don't think that I have ever used "-d".  In fact, I bet I will
hardly ever use "-D" any more if we make the above change.  The only
reason I ever used insert statements was to deal with loading into a
different schema.

So who was it that wanted to make this change.  Perhaps I can help.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: Edge case problem with pg_dump

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> So who was it that wanted to make this change.  Perhaps I can help.

I forget who had volunteered to work on it, but it was several months
ago and nothing's happened ...
        regards, tom lane


Re: Edge case problem with pg_dump

From
Philip Warner
Date:
At 10:51 23/05/02 -0400, Tom Lane wrote:
>"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > So who was it that wanted to make this change.  Perhaps I can help.
>
>I forget who had volunteered to work on it, but it was several months
>ago and nothing's happened ...

Not sure if this is the right reference, but about 30-Apr-2001, Alfred 
Perlstein raised the problem of column names in COPY, and you poured water 
on the idea:
    http://archives.postgresql.org/pgsql-hackers/2001-04/msg01132.php

ISTM that we do need *some* solution to the problem, and that based on your 
comments there are a couple of possibilities:

(a) AP: Allow COPY(OUT) to dump column info. Probably only the name of the 
column. Then (i'd guess) allow COPY(IN) to map named columns to new names,

(b) TL: One possibility is to fix ALTER TABLE ADD COLUMN to maintain the same
column ordering in parents and children.

At the time you stated that:
    COPY with specified columns may in fact be the best way to deal with    that particular issue, if pg_dump is all we
careabout fixing.  However    there are a bunch of things that have a problem with it, not only    pg_dump.  See thread
overin committers about functions and inheritance.
 

I'm not sure what these issues are, but it does seem to me that some more 
portable COPY format would be desirable and that solution (b) will not 
solve the problem if you are trying to restore a table that has had an attr 
deleted.

In your responses you also raised the problem of COPY having to know about 
default values for columns if we allow subsets of columns when we load 
data; does that mean that COPY does something more fancy than the 
equivalent of an INSERT?




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: Edge case problem with pg_dump

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Not sure if this is the right reference, but about 30-Apr-2001, Alfred 
> Perlstein raised the problem of column names in COPY, and you poured water 
> on the idea:

So I did, but I've changed my mind --- it would provide a usable solution
to this inheritance problem, which has been with us forever, and would
have other uses too.

> (b) TL: One possibility is to fix ALTER TABLE ADD COLUMN to maintain the same
> column ordering in parents and children.

That would be a nice solution but I do not think it'll happen in the
foreseeable future :-(.  Certainly we're no closer to making it happen
than we were a year ago.

> In your responses you also raised the problem of COPY having to know about 
> default values for columns if we allow subsets of columns when we load 
> data; does that mean that COPY does something more fancy than the 
> equivalent of an INSERT?

No, but it would have to be equivalent to an INSERT.  BTW, the
default-value mechanism is cleaner than it used to be and so this
doesn't seem like as serious an objection anymore.  Since COPY already
has to have enough mechanism to evaluate constraint expressions,
evaluating defaults too doesn't seem that horrid.
        regards, tom lane


Re: Edge case problem with pg_dump

From
Brent Verner
Date:
[2002-05-23 10:51] Tom Lane said:
| "D'Arcy J.M. Cain" <darcy@druid.net> writes:
| > So who was it that wanted to make this change.  Perhaps I can help.
| 
| I forget who had volunteered to work on it, but it was several months
| ago and nothing's happened ...

I'd be the disappearing culprit...  This patch _was_ mostly done at one
point around 7.2 released, infact I've been running the patch on three
production installs.  I'll take a look at making the patch current
and resubmitting.

cheers. b

-- 
"Develop your talent, man, and leave the world something. Records are 
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing."  -- Duane Allman


Re: Edge case problem with pg_dump

From
Brent Verner
Date:
[2002-05-25 11:44] Tom Lane said:
| > In your responses you also raised the problem of COPY having to know about 
| > default values for columns if we allow subsets of columns when we load 
| > data; does that mean that COPY does something more fancy than the 
| > equivalent of an INSERT?
| 
| No, but it would have to be equivalent to an INSERT.  BTW, the
| default-value mechanism is cleaner than it used to be and so this
| doesn't seem like as serious an objection anymore.  Since COPY already
| has to have enough mechanism to evaluate constraint expressions,
| evaluating defaults too doesn't seem that horrid.

The last version of the COPY (attlist) patch does use column defaults.
Again, I'll try to get this cleaned up over this (long) weekend.
 b

-- 
"Develop your talent, man, and leave the world something. Records are 
really gifts from people. To think that an artist would love you enough
to share his music with anyone is a beautiful thing."  -- Duane Allman