Thread: Edge case problem with pg_dump
There seems to be a pg_dump issue with inherited tables when columns are added to the parent table after creating the child table. Here is how to repeat the problem. Create a test database and run the following SQL statements: create table a (x int); create table b (y int) inherits (a); alter table a add column z int; insert into b values (1, 2, 3); select * from b; You should see this: test1=# select * from b;x | y | z ---+---+---1 | 2 | 3(1 row) Now create a second test database and dump the first into the second: pg_dump test1 | psql test2 Now try that last select statement in the new database and you will see: test2=# select * from b;x | z | y ---+---+---1 | 2 | 3(1 row) If you are lucky the restore fails because of conflicting types. Worse, as in this example, the types are the same and it silently messes up your data by reversing the names on two columns. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > There seems to be a pg_dump issue with inherited tables when columns > are added to the parent table after creating the child table. It's always been there --- ever tried dumping and reloading the regression database? Right now the only safe way to dump such a database is to use the inserts-with-explicit-column-names option. Someone was working on extending COPY to allow a column name list, and as soon as that gets done I intend to change pg_dump to specify a column name list in COPY commands. That should fix this problem. regards, tom lane
On May 22, 2002 10:28 am, you wrote: > Right now the only safe way to dump such a database is to use the > inserts-with-explicit-column-names option. Someone was working on > extending COPY to allow a column name list, and as soon as that gets > done I intend to change pg_dump to specify a column name list in > COPY commands. That should fix this problem. Do you mean issue COPY commands with fields or COPY out the fields in a specific order by using the extension in pg_dump? Seems like the latter would be cleaner but the former is probably a lot simpler to do. What would the new syntax of the COPY look like? -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > On May 22, 2002 10:28 am, you wrote: >> Right now the only safe way to dump such a database is to use the >> inserts-with-explicit-column-names option. Someone was working on >> extending COPY to allow a column name list, and as soon as that gets >> done I intend to change pg_dump to specify a column name list in >> COPY commands. That should fix this problem. > Do you mean issue COPY commands with fields or COPY out the fields in a > specific order by using the extension in pg_dump? I intended that the dump scripts would say something like COPY mytab(field1,field2,field3) FROM STDIN; which would make it absolutely clear what the dump's field order is. We can't solve it by reordering the fields while we dump, which is what I think you mean by the other alternative: how is pg_dump to guess what schema you are going to load the data into? For example, it should work to do a data-only dump and then reload into the existing table structure. So the dump script really needs to work for either column ordering in the destination table, and that's why we need explicit labeling of the field order in the script. If we take this really seriously we might want to eliminate pg_dump's -d (simple INSERT) option, and have only two dump formats: COPY with field labels, or INSERT with field labels. regards, tom lane
* Tom Lane <tgl@sss.pgh.pa.us> [020523 10:24]: > "D'Arcy J.M. Cain" <darcy@druid.net> writes: > > Do you mean issue COPY commands with fields or COPY out the fields in a > > specific order by using the extension in pg_dump? > > I intended that the dump scripts would say something like > > COPY mytab(field1,field2,field3) FROM STDIN; Cool. I assume that the "(field1,field2,field3)" would be optional for backwards compatibility. > which would make it absolutely clear what the dump's field order is. > We can't solve it by reordering the fields while we dump, which is > what I think you mean by the other alternative: how is pg_dump to > guess what schema you are going to load the data into? For example, Well, the issue now is that it creates the schema too but it is out of sync with the data it spits out. I can see how figuring it out is a lot more difficult though. The above works. > it should work to do a data-only dump and then reload into the existing > table structure. So the dump script really needs to work for either > column ordering in the destination table, and that's why we need > explicit labeling of the field order in the script. That's nice. I have scripts that effectively do this in code now when I have to dump from one schema and load into another. > If we take this really seriously we might want to eliminate pg_dump's > -d (simple INSERT) option, and have only two dump formats: COPY with > field labels, or INSERT with field labels. Yah, I don't think that I have ever used "-d". In fact, I bet I will hardly ever use "-D" any more if we make the above change. The only reason I ever used insert statements was to deal with loading into a different schema. So who was it that wanted to make this change. Perhaps I can help. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > So who was it that wanted to make this change. Perhaps I can help. I forget who had volunteered to work on it, but it was several months ago and nothing's happened ... regards, tom lane
At 10:51 23/05/02 -0400, Tom Lane wrote: >"D'Arcy J.M. Cain" <darcy@druid.net> writes: > > So who was it that wanted to make this change. Perhaps I can help. > >I forget who had volunteered to work on it, but it was several months >ago and nothing's happened ... Not sure if this is the right reference, but about 30-Apr-2001, Alfred Perlstein raised the problem of column names in COPY, and you poured water on the idea: http://archives.postgresql.org/pgsql-hackers/2001-04/msg01132.php ISTM that we do need *some* solution to the problem, and that based on your comments there are a couple of possibilities: (a) AP: Allow COPY(OUT) to dump column info. Probably only the name of the column. Then (i'd guess) allow COPY(IN) to map named columns to new names, (b) TL: One possibility is to fix ALTER TABLE ADD COLUMN to maintain the same column ordering in parents and children. At the time you stated that: COPY with specified columns may in fact be the best way to deal with that particular issue, if pg_dump is all we careabout fixing. However there are a bunch of things that have a problem with it, not only pg_dump. See thread overin committers about functions and inheritance. I'm not sure what these issues are, but it does seem to me that some more portable COPY format would be desirable and that solution (b) will not solve the problem if you are trying to restore a table that has had an attr deleted. In your responses you also raised the problem of COPY having to know about default values for columns if we allow subsets of columns when we load data; does that mean that COPY does something more fancy than the equivalent of an INSERT? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Not sure if this is the right reference, but about 30-Apr-2001, Alfred > Perlstein raised the problem of column names in COPY, and you poured water > on the idea: So I did, but I've changed my mind --- it would provide a usable solution to this inheritance problem, which has been with us forever, and would have other uses too. > (b) TL: One possibility is to fix ALTER TABLE ADD COLUMN to maintain the same > column ordering in parents and children. That would be a nice solution but I do not think it'll happen in the foreseeable future :-(. Certainly we're no closer to making it happen than we were a year ago. > In your responses you also raised the problem of COPY having to know about > default values for columns if we allow subsets of columns when we load > data; does that mean that COPY does something more fancy than the > equivalent of an INSERT? No, but it would have to be equivalent to an INSERT. BTW, the default-value mechanism is cleaner than it used to be and so this doesn't seem like as serious an objection anymore. Since COPY already has to have enough mechanism to evaluate constraint expressions, evaluating defaults too doesn't seem that horrid. regards, tom lane
[2002-05-23 10:51] Tom Lane said: | "D'Arcy J.M. Cain" <darcy@druid.net> writes: | > So who was it that wanted to make this change. Perhaps I can help. | | I forget who had volunteered to work on it, but it was several months | ago and nothing's happened ... I'd be the disappearing culprit... This patch _was_ mostly done at one point around 7.2 released, infact I've been running the patch on three production installs. I'll take a look at making the patch current and resubmitting. cheers. b -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman
[2002-05-25 11:44] Tom Lane said: | > In your responses you also raised the problem of COPY having to know about | > default values for columns if we allow subsets of columns when we load | > data; does that mean that COPY does something more fancy than the | > equivalent of an INSERT? | | No, but it would have to be equivalent to an INSERT. BTW, the | default-value mechanism is cleaner than it used to be and so this | doesn't seem like as serious an objection anymore. Since COPY already | has to have enough mechanism to evaluate constraint expressions, | evaluating defaults too doesn't seem that horrid. The last version of the COPY (attlist) patch does use column defaults. Again, I'll try to get this cleaned up over this (long) weekend. b -- "Develop your talent, man, and leave the world something. Records are really gifts from people. To think that an artist would love you enough to share his music with anyone is a beautiful thing." -- Duane Allman