Thread: pg_dump versus SERIAL, round N
We have still another complaint here: http://archives.postgresql.org/pgsql-bugs/2006-08/msg00109.php about pg_dump failing to cope nicely with any slightly-unusual condition related to a SERIAL column. We've had previous discussions about this, most recently this thread: http://archives.postgresql.org/pgsql-hackers/2006-04/msg01118.php but still no consensus on what to do. I'm pretty well convinced at this point that having pg_dump dump serial columns via SERIAL declarations is a failure. Doing it that way means that the underlying sequence may change names during reload, which pg_dump is not well prepared to cope with --- we have a hack in place for setval() but not for anything else, notably GRANT, nondefault ALTER SEQUENCE parameters, and references to the sequence from other tables. Even if there were a reasonably simple way to fix all those things, since when is it part of pg_dump's charter to editorialize on your database schema? The goal ought to be to reproduce the state of the database, not to "clean it up". I think that we ought to make pg_dump dump these things as if they were separate sequence objects, ie, in the style CREATE SEQUENCE foo_bar_seq ... any nondefault parameters here ... ... CREATE TABLE foo ( bar int default nextval('foo_bar_seq'), ... This fixes most of the problems at a stroke. The one thing it breaks is that after loading such a dump, there is no longer any dependency from the sequence to the column, and thus dropping the column wouldn't cause the sequence to go away, as it would've in the original database. The new thought I had this morning was to fix that head-on: provide a way for pg_dump to re-establish that dependency. If you were willing to load the dump as superuser it could just INSERT a row into pg_depend, but that's certainly not an acceptable assumption (and it wouldn't be future-proof anyway; we learned the folly of that with pg_dumpall's hacking on the pg_group table...) However, I see nothing wrong with providing a slightly more abstract way of declaring that a sequence "belongs to" a column. If we're willing to hack up the grammar a bit we could make ALTER SEQUENCE do it, perhaps ALTER SEQUENCE foo_bar_seq SERIAL FOR foo.bar; which I would propose being allowed to anyone who owns both the sequence and the table in question. Or the poor man's way to do it would involve creating a pg_set_serial_sequence() function that does the same thing. Further down the road we could consider allowing this command to drop a serial-sequence association or reattach a sequence to a different owning column, though pg_dump needs neither of these. This seems no uglier to me than the existing pg_dump hack that uses pg_get_serial_sequence(), and AFAICS it allows exact reproduction of the state of the database, even in the face of renames, ALTER COLUMN DEFAULT, etc. In terms of the discussion I mentioned above, this amounts to embracing the "SERIAL is a macro" philosophy rather than "SERIAL is a black box", and recognizing that there's one little piece of the implementation that still needs to be exposed so that we can describe exactly what the macro consists of. Comments? regards, tom lane
On Sat, Aug 19, 2006 at 11:47:39AM -0400, Tom Lane wrote: > ALTER SEQUENCE foo_bar_seq SERIAL FOR foo.bar; I like it, and I imagine users will love it too. Only one question: will a sequence be limited to belonging to one table at a time, or could you use one sequence for multiple tables and use this to declare a dependancy on them all? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Sat, Aug 19, 2006 at 11:47:39AM -0400, Tom Lane wrote: >> ALTER SEQUENCE foo_bar_seq SERIAL FOR foo.bar; > I like it, and I imagine users will love it too. Only one question: > will a sequence be limited to belonging to one table at a time, or > could you use one sequence for multiple tables and use this to declare > a dependancy on them all? AFAICS it doesn't make sense to have the sequence "belong to" more than one column at a time. Keep in mind that writingDEFAULT nextval('foo_seq') already causes the default expression to depend on foo_seq, so that if you drop foo_seq the default will go away too (if CASCADE) or cause an error (if RESTRICT). What's at stake here is whether dropping a column should cause the sequence to go away, and I can't really see that you'd want this to be true for more than one column. So my proposal is for the ALTER SEQUENCE command to be defined as "remove any existing internal dependency linking from the sequence to any table column, then add one linking to this column". Also, after thinking about the existing behavior of ALTER TABLE OWNER (it tries to keep ownership of dependent sequences equal to the table's ownership), we'd have to either abandon that or insist that you can only link a sequence to a table having the same owner. So that's another reason for not allowing a sequence to be linked to multiple tables --- ALTER TABLE OWNER would inevitably create a mess. regards, tom lane
I wrote: > Also, after thinking about the existing behavior of ALTER TABLE OWNER > (it tries to keep ownership of dependent sequences equal to the table's > ownership), we'd have to either abandon that or insist that you can > only link a sequence to a table having the same owner. So that's > another reason for not allowing a sequence to be linked to multiple > tables --- ALTER TABLE OWNER would inevitably create a mess. After further reflection on that point, I'm thinking that the ALTER command should explicitly use the notion of "ownership" rather than referencing SERIAL as such. So here's a concrete proposal: ALTER SEQUENCE sequence_name OWNED BY table_name.column_nameALTER SEQUENCE sequence_name OWNED BY NONE This requires no keywords we don't already have. Restrictions would be * you must have ownership permissions on the sequence * in the first case, the table and sequence must have identical owners (not necessarily you, consider ownership by a grouprole) and must be in the same schema. This maintains invariants that are already preserved by ALTER TABLE. I'm also inclined to change the type of the dependency from INTERNAL to AUTO. Per comments in dependency.h: * DEPENDENCY_AUTO ('a'): the dependent object can be dropped separately* from the referenced object, and should be automaticallydropped* (regardless of RESTRICT or CASCADE mode) if the referenced object* is dropped.* Example: a named constrainton a table is made auto-dependent on* the table, so that it will go away if the table is dropped.** DEPENDENCY_INTERNAL('i'): the dependent object was created as part* of creation of the referenced object, and is really justa part of* its internal implementation. A DROP of the dependent object will be* disallowed outright (we'll tell theuser to issue a DROP against the* referenced object, instead). A DROP of the referenced object will be* propagated throughto drop the dependent object whether CASCADE is* specified or not.* Example: a trigger that's created to enforce aforeign-key constraint* is made internally dependent on the constraint's pg_constraint entry. Basically this change would mean that you'd be allowed to DROP the sequence with CASCADE (hence removing all the DEFAULT expressions that use it) without being forced to drop the owning column as such. That seems to square better with the idea that the column "owns" the sequence. In this new approach I don't think we are considering the sequence as an integral part of the column's implementation, so INTERNAL seems too strong. BTW, will anyone object to doing this now, ie, for 8.2? I claim it's a bug fix not a new feature ;-) regards, tom lane
Tom Lane wrote: > Basically this change would mean that you'd be allowed to DROP the > sequence with CASCADE (hence removing all the DEFAULT expressions that > use it) without being forced to drop the owning column as such. That > seems to square better with the idea that the column "owns" the > sequence. In this new approach I don't think we are considering the > sequence as an integral part of the column's implementation, so > INTERNAL seems too strong. Basically as I see it, instead of making SERIAL a macro in the database, you are making it a macro in pg_dump, and on reload, you are creating a way to relink the sequence to the column so it still functions as a black box when in the database. That seems like a fine solution. Our two SERIAL TODO items are: * %Disallow changing DEFAULT expression of a SERIAL column? This should be done only if the existing SERIAL problems cannotbe fixed.* %Disallow ALTER SEQUENCE changes for SERIAL sequences because pg_dump does not dump the changes How would your proposal handle these cases? Would changing the default of a SERIAL column detach the column/sequence dependency? I would think so, and dump/reload would work fine. And ALTER SEQUENCE would dump fine too, because you are doing the split in pg_dump? > BTW, will anyone object to doing this now, ie, for 8.2? I claim it's a > bug fix not a new feature ;-) I knew that part was coming, and obviously you knew I knew too. ;-) -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Our two SERIAL TODO items are: > * %Disallow changing DEFAULT expression of a SERIAL column? > This should be done only if the existing SERIAL problems cannot be > fixed. > * %Disallow ALTER SEQUENCE changes for SERIAL sequences because pg_dump > does not dump the changes > How would your proposal handle these cases? Both those TODOs go into the bit bucket. There won't be any reason to forbid either, because pg_dump will do the right things. > Would changing the default > of a SERIAL column detach the column/sequence dependency? As proposed, changing the default would not delete the OWNED BY dependency, but you could do that by hand if it were no longer appropriate. I don't see any risk of accidentally dropping a still-used sequence, because of the already-added dependencies for nextval() references. Consider CREATE TABLE t1 (f1 serial); CREATE TABLE t2 (f2 int default nextval('t1_f1_seq')); At this point there's a normal dependency from t1.f1's default expression to t1_f1_seq, and another one from t2.f2's default. With my proposal there would also be an auto (not internal anymore) dependency from t1_f1_seq to the column t1.f1. If you now do ALTER TABLE t1 ALTER COLUMN f1 DROP DEFAULT then the first of the aforementioned dependencies goes away, but the other two are still there. If you now try, say, DROP TABLE t1; it will auto-cascade from t1 to t1_f1_seq, then try to normal cascade to t2.f2's default, and there error out because you didn't say CASCADE. At this point you could either CASCADE (and lose the default for t2.f2) or do ALTER SEQUENCE to move or drop the OWNED BY link. Almost everything I just said is already how it works today; the difference is that today you do not have the option to drop t1 without dropping the sequence, because there's no (non-hack) way to remove the dependency. regards, tom lane
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Our two SERIAL TODO items are: > > > * %Disallow changing DEFAULT expression of a SERIAL column? > > > This should be done only if the existing SERIAL problems cannot be > > fixed. > > > * %Disallow ALTER SEQUENCE changes for SERIAL sequences because pg_dump > > does not dump the changes > > > How would your proposal handle these cases? > > Both those TODOs go into the bit bucket. There won't be any reason > to forbid either, because pg_dump will do the right things. Bit bucket is good. :-) > > Would changing the default > > of a SERIAL column detach the column/sequence dependency? > > As proposed, changing the default would not delete the OWNED BY > dependency, but you could do that by hand if it were no longer > appropriate. I don't see any risk of accidentally dropping a > still-used sequence, because of the already-added dependencies > for nextval() references. Consider > > CREATE TABLE t1 (f1 serial); > > CREATE TABLE t2 (f2 int default nextval('t1_f1_seq')); > > At this point there's a normal dependency from t1.f1's default > expression to t1_f1_seq, and another one from t2.f2's default. > With my proposal there would also be an auto (not internal > anymore) dependency from t1_f1_seq to the column t1.f1. > > If you now do > > ALTER TABLE t1 ALTER COLUMN f1 DROP DEFAULT > > then the first of the aforementioned dependencies goes away, > but the other two are still there. If you now try, say, > > DROP TABLE t1; > > it will auto-cascade from t1 to t1_f1_seq, then try to normal cascade > to t2.f2's default, and there error out because you didn't say CASCADE. > At this point you could either CASCADE (and lose the default for t2.f2) > or do ALTER SEQUENCE to move or drop the OWNED BY link. > > Almost everything I just said is already how it works today; the > difference is that today you do not have the option to drop t1 without > dropping the sequence, because there's no (non-hack) way to remove the > dependency. Sounds good. The only user-visible change is that pg_dump no longer dumps out "SERIAL". but psql \d doesn't show SERIAL either, so I don't see any problem with that. The only complaint I can see is that someone who wants pg_dump to dump out SERIAL so it appears just as he created the table, doesn't get that. Could we have pg_dump do that if the sequences all match the creation (weren't modified)? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > The only complaint I can see is that someone > who wants pg_dump to dump out SERIAL so it appears just as he created > the table, doesn't get that. Could we have pg_dump do that if the > sequences all match the creation (weren't modified)? pg_dump's output is often very far from what you typed in anyway. My response to that person would be "get a life" ... regards, tom lane
Tom Lane wrote: > > Almost everything I just said is already how it works today; the > difference is that today you do not have the option to drop t1 without > dropping the sequence, because there's no (non-hack) way to remove the > dependency. > As far as I understand your proposal I like it, but I'd like to insure that the situation where a sequence is used by multiple tables is handled correctly. There _are_ databases that reuse a sequence for multiple serial-like columns, and pgadmin supports this (including a pg_depend insert, which would need a version dependent fix). Regards, Andreas
Andreas Pflug wrote: > Tom Lane wrote: >> >> Almost everything I just said is already how it works today; the >> difference is that today you do not have the option to drop t1 without >> dropping the sequence, because there's no (non-hack) way to remove the >> dependency. >> > As far as I understand your proposal I like it, but I'd like to insure > that the situation where a sequence is used by multiple tables is > handled correctly. There _are_ databases that reuse a sequence for > multiple serial-like columns, and pgadmin supports this (including a > pg_depend insert, which would need a version dependent fix). > If we were implementing serial from scratch, I would be arguing that the underlying sequence should be merely an implementation detail that should be totally hidden, and sequences used explicitly should be kept as a separate concept. Then many of these problems simply wouldn't exist. I realise that might be difficult to get to now :-( cheers andrew
Andreas Pflug <pgadmin@pse-consulting.de> writes: > As far as I understand your proposal I like it, but I'd like to insure > that the situation where a sequence is used by multiple tables is > handled correctly. There _are_ databases that reuse a sequence for > multiple serial-like columns, and pgadmin supports this (including a > pg_depend insert, which would need a version dependent fix). What do you think is "correctly"? It's already the case in 8.1 that saying DEFAULT nextval('foo_seq') generates a normal dependency from the column default expression to foo_seq, which means that you can't drop the sequence without saying CASCADE, and if you say that then all the dependent defaults are dropped. My recommendation for a multi-table scenario would be that the sequence be created "by hand" as a standalone object, in which case those normal dependencies are the only ones involved, and my proposal won't change the behavior at all from what it was in 8.1. If you insist on initially creating the sequence by saying SERIAL for the first of the tables, and then saying DEFAULT nextval('foo_seq') for the rest, then under both 8.1 and my proposal you'd not be able to drop the first table without dropping the sequence (thus requiring you to say CASCADE so that the other tables' defaults can be dropped). The difference is that I'm proposing a way to decouple the sequence from its original owning column and make it into a true freestanding object, after which you could drop the first table without losing the sequence and the other defaults. Basically the proposed command allows you to convert from the case where a sequence was created by SERIAL to the case where it was created free-standing, or vice versa. The other change is that using an AUTO instead of INTERNAL dependency makes it legal to drop the sequence without dropping the column. My testing version does this: regression=# create table zit (f1 serial); NOTICE: CREATE TABLE will create implicit sequence "zit_f1_seq" for serial column "zit.f1" CREATE TABLE regression=# drop sequence zit_f1_seq; NOTICE: default for table zit column f1 depends on sequence zit_f1_seq ERROR: cannot drop sequence zit_f1_seq because other objects depend on it HINT: Use DROP ... CASCADE to drop the dependent objects too. regression=# drop sequence zit_f1_seq cascade; NOTICE: drop cascades to default for table zit column f1 DROP SEQUENCE regression=# \d zit Table "public.zit"Column | Type | Modifiers --------+---------+-----------f1 | integer | not null regression=# Previous versions would have disallowed that DROP even with CASCADE; the *only* way to get rid of a SERIAL-created sequence was to drop the owning column (or its whole table) altogether. If you try the same thing in 8.1 you get regression=# drop sequence zit_f1_seq cascade; ERROR: cannot drop sequence zit_f1_seq because table zit column f1 requires it HINT: You may drop table zit column f1 instead. AFAICS this doesn't disallow anything you could do before, and it allows fixing the problems pg_dump is having. Is there something you need it to do that it doesn't do? regards, tom lane
"Andrew Dunstan" <andrew@dunslane.net> writes: > If we were implementing serial from scratch, I would be arguing that the > underlying sequence should be merely an implementation detail that should > be totally hidden, and sequences used explicitly should be kept as a > separate concept. Then many of these problems simply wouldn't exist. I > realise that might be difficult to get to now :-( Well, we're not in a green field anymore :-(. In any case there would be some serious practical disadvantages in trying to hide the underlying sequence fully: * you couldn't use ALTER SEQUENCE, eg to adjust the sequence's CYCLE property, which seems like a useful thing to do; * permissions management would get interesting too; * how's pg_dump going to access the sequence to restore its correct count value etc? I think we'd end up building a lot of facilities parallel to those that exist for "ordinary" sequences, and then this doesn't seem like such a clean solution anymore... regards, tom lane
Tom Lane wrote: > If you insist on initially creating the sequence by saying SERIAL for > the first of the tables, and then saying DEFAULT nextval('foo_seq') > for the rest, then under both 8.1 and my proposal you'd not be able to > drop the first table without dropping the sequence (thus requiring you > to say CASCADE so that the other tables' defaults can be dropped). > The difference is that I'm proposing a way to decouple the sequence from > its original owning column and make it into a true freestanding object, > after which you could drop the first table without losing the sequence and > the other defaults. > For decoupling, you'd require ALTER SEQUENCE ... OWNER BY NONE to be executed, right? I basically doubt the concept of a single owner. I'd expect a sequence to be dropped from cascaded table dropping, if that was the last usage and dependencies existed. This would probably mean "multiple owners". > Basically the proposed command allows you to convert from the case where > a sequence was created by SERIAL to the case where it was created > free-standing, or vice versa. > > The other change is that using an AUTO instead of INTERNAL dependency > makes it legal to drop the sequence without dropping the column. > Sounds fine. > > AFAICS this doesn't disallow anything you could do before, and it > allows fixing the problems pg_dump is having. Is there something > you need it to do that it doesn't do? > Sequence cleanup with multiple tables (multiple owners). Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes: > I basically doubt the concept of a single owner. I'd expect a sequence > to be dropped from cascaded table dropping, if that was the last usage > and dependencies existed. This would probably mean "multiple owners". That's not going to happen without extensive revisions to our dependency mechanisms, which I am not about to undertake now. And I don't see the point anyway. If you did have a sequence being used to feed multiple tables, why would you want it to go away if the number of tables dropped transiently to zero? If you then want to add back another table being fed by that sequence, you've lost the state of the sequence. That's the same sort of corner case that prompted us to allow zero-column tables, ie, the table can continue to exist even if it momentarily has no columns. I see SERIAL as a simple shorthand for a common case, not some hydra-headed beast ... regards, tom lane
Bruce Momjian <bruce@momjian.us> writes: > What method will people use to see if a sequence used as a default is > one that was created by SERIAL, and will be dropped by drop table, or > manually created? How does that distinction show up in pg_dump? Hm. It will show in pg_dump because there will (or won't) be an ALTER SEQUENCE OWNED BY command, but right now the only way to see if a sequence is owned is to look in pg_depend for a link to a table column. That's how it's always been before, too --- have you noticed any complaints? We could consider adding something to psql's \ds display to show ownership, but that's definitely getting into the realm of "new feature" rather than "bug fix", and given the lack of past requests for it I can't say that I find it to be an immediate must-have. regards, tom lane
Tom Lane wrote: > Andreas Pflug <pgadmin@pse-consulting.de> writes: > > I basically doubt the concept of a single owner. I'd expect a sequence > > to be dropped from cascaded table dropping, if that was the last usage > > and dependencies existed. This would probably mean "multiple owners". > > That's not going to happen without extensive revisions to our dependency > mechanisms, which I am not about to undertake now. And I don't see the > point anyway. If you did have a sequence being used to feed multiple > tables, why would you want it to go away if the number of tables dropped > transiently to zero? If you then want to add back another table being > fed by that sequence, you've lost the state of the sequence. That's the > same sort of corner case that prompted us to allow zero-column tables, > ie, the table can continue to exist even if it momentarily has no > columns. Agreed. I think it makes sense that if SERIAL auto-created the column, dropping the table should remove the sequence, but if the sequence was created manually, drop table should not drop it automatically. What method will people use to see if a sequence used as a default is one that was created by SERIAL, and will be dropped by drop table, or manually created? How does that distinction show up in pg_dump? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > What method will people use to see if a sequence used as a default is > > one that was created by SERIAL, and will be dropped by drop table, or > > manually created? How does that distinction show up in pg_dump? > > Hm. It will show in pg_dump because there will (or won't) be an ALTER > SEQUENCE OWNED BY command, but right now the only way to see if a > sequence is owned is to look in pg_depend for a link to a table column. > That's how it's always been before, too --- have you noticed any > complaints? > > We could consider adding something to psql's \ds display to show > ownership, but that's definitely getting into the realm of "new feature" > rather than "bug fix", and given the lack of past requests for it > I can't say that I find it to be an immediate must-have. Right. My only point is that right now SERIAL shows up in pg_dump, while in the future it will show up as SEQUENCE OWNED BY. We just need to look out if people get confused. Also, if someone restores one table, does the sequence come with it like it does now with SERIAL? -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Also, if someone restores one table, does the sequence come with it like > it does now with SERIAL? Hm, probably not. I do have pg_dump set to force dumping of the sequence if you try to dump just its table, but it'd be possible to tell pg_restore (via -l) to load only the table, which could fail. I'm not too excited about that though; there are plenty of other ways to tell pg_restore to restore a database subset that won't work. regards, tom lane
Bruce Momjian <bruce@momjian.us> writes: > What method will people use to see if a sequence used as a default is > one that was created by SERIAL, and will be dropped by drop table, or > manually created? How does that distinction show up in pg_dump? BTW, it's easy to see if a column has an associated sequence: pg_get_serial_sequence(). It's going from the sequence to the column that requires manually looking into pg_depend. regards, tom lane
On Sat, 2006-08-19 at 16:01 -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > The only complaint I can see is that someone > > who wants pg_dump to dump out SERIAL so it appears just as he created > > the table, doesn't get that. Could we have pg_dump do that if the > > sequences all match the creation (weren't modified)? > > pg_dump's output is often very far from what you typed in anyway. > My response to that person would be "get a life" ... > As the one who brought up the issue (again) I'd like to add that the suggested solution sounds good to me. Thanks a lot for your great work on postgres. You rock! Morus