Thread: Dump serials as serial -- not a sequence
- Fix an issue indexing domains. Ensure we look for the operator class of the base type, not the domain type. - Add an is_serial boolean attribute to sequence tables. - Add serial4 and serial8 domains with respective serial and bigserial aliases (similar to int4 and int8 to integer and bigint) - Jump through hoops to deal with a primary key'd serial column, alter table add primary key doesn't work for these. - Fix type sanity checks to account for domains. End result, pg_dump no longer uses sequences for serials and psql shows serials rather than integer columns.
Attachment
Your patch has been added to the PostgreSQL unapplied patches list at: http://candle.pha.pa.us/cgi-bin/pgpatches I will try to apply it within the next 48 hours. --------------------------------------------------------------------------- Rod Taylor wrote: > - Fix an issue indexing domains. Ensure we look for the operator class > of the base type, not the domain type. > - Add an is_serial boolean attribute to sequence tables. > - Add serial4 and serial8 domains with respective serial and bigserial > aliases (similar to int4 and int8 to integer and bigint) > - Jump through hoops to deal with a primary key'd serial column, alter > table add primary key doesn't work for these. > - Fix type sanity checks to account for domains. > > End result, pg_dump no longer uses sequences for serials and psql shows > serials rather than integer columns. [ Attachment, skipping... ] > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Rod Taylor <rbt@zort.ca> writes: > - Add an is_serial boolean attribute to sequence tables. > - Add serial4 and serial8 domains with respective serial and bigserial > aliases (similar to int4 and int8 to integer and bigint) > - Jump through hoops to deal with a primary key'd serial column, alter > table add primary key doesn't work for these. > - Fix type sanity checks to account for domains. > End result, pg_dump no longer uses sequences for serials and psql shows > serials rather than integer columns. I think we have to reject this patch. The killer problem with turning serial columns into a separate serial domain type is that it will break existing clients, which are expecting the column type to look like plain int4 (or int8 if they are new enough to know about bigserial). As an example, the ODBC driver will fail to recognize "serial4" as a numeric datatype at all; I suspect the same is true of JDBC. We could run around and try to fix the clients --- indeed, I suspect the ODBC/JDBC drivers will need some kind of generic way of dealing with domain types. (Anyone for exposing getBaseType as a SQL function?) However, the lack of domain support in old clients doesn't break those clients as long as they're dealing with existing schemas, which haven't got domains ... unless we try to change serial into a domain. Then we'd definitely be creating a compatibility issue for existing applications. The bottom line is that I do not think it's worth the trouble to change serial into a domain, because the entire motivation for this was to make pg_dump simpler, and a look at your patch shows that it's made pg_dump's life harder not easier. The reason this isn't good for pg_dump is that inherited serial columns don't work nicely for pg_dump if it tries to use the column datatype to track them. What pg_dump wants is to consider only the parent table to have a true "serial" column, while treating child tables as plain int columns with a default clause, same as it ever was. But expressing serial-ness as a datatype causes it to inherit, and then you have to jump through hoops to stop the child tables from being treated as having serial columns. I think the right way to attack this is to leave the backend code and database representation for serials alone. pg_dump should be taught to recognize a serial-column association on the basis of there being an internal-type dependency between an int column and a sequence. Since the dependency only exists between the sequence and the original parent table, there's no problem with doing the wrong thing for child tables. (Since int columns are probably more common than sequence objects, I'd be inclined to do the join against pg_depend only when looking at a sequence object, and if we get a match then to look through pg_dump's internal list of tables to find and mark the corresponding column.) psql's \d command could also be made to recognize serials the same way, though I am inclined to leave it alone. I rather like the fact that \d shows me the DEFAULT expression for a serial, because that way I can easily see the name of the underlying sequence, in case I want to do manual operations on the sequence. Do you want to attack it this way, or shall I? regards, tom lane
On Sat, 2002-08-17 at 14:16, Tom Lane wrote: > Rod Taylor <rbt@zort.ca> writes: > > - Add an is_serial boolean attribute to sequence tables. > > - Add serial4 and serial8 domains with respective serial and bigserial > > aliases (similar to int4 and int8 to integer and bigint) > > - Jump through hoops to deal with a primary key'd serial column, alter > > table add primary key doesn't work for these. > > - Fix type sanity checks to account for domains. > > End result, pg_dump no longer uses sequences for serials and psql shows > > serials rather than integer columns. > The bottom line is that I do not think it's worth the trouble to change > serial into a domain, because the entire motivation for this was to make > pg_dump simpler, and a look at your patch shows that it's made pg_dump's > life harder not easier. I was also trying to make it easier for frontends to determine an auto-incrementing column from one with a simple default. Right now the test is a default with nextval() in it. However, I could also accommodate that with a simple pg_is_serial(table oid, attnum) style function -- pg_dump can use it too I suppose. > I think the right way to attack this is to leave the backend code and > database representation for serials alone. pg_dump should be taught to > recognize a serial-column association on the basis of there being an > internal-type dependency between an int column and a sequence. Since > the dependency only exists between the sequence and the original parent > table, there's no problem with doing the wrong thing for child tables. > (Since int columns are probably more common than sequence objects, > I'd be inclined to do the join against pg_depend only when looking at > a sequence object, and if we get a match then to look through pg_dump's > internal list of tables to find and mark the corresponding column.) Hmm.. We'll need somekind of marker on the sequence to determine whether its a serial or not when (if) dependencies will be able to reach sequences within nextval (<seqname>.nextval support). Should the is_serial marker remain on sequences (as applied in the patch)? We still need a lot of of the hoop jumping in pg_dump to account for PRIMARY KEY's on a serial column. The implicitly created unique index at creation of a serial column doesn't allow ALTER TABLE ADD PRIMARY KEY() to work nicely -- especially since they generally get the same name. The hoops pushed PRIMARY KEY definition back into the CREATE TABLE statement when required. > psql's \d command could also be made to recognize serials the same > way, though I am inclined to leave it alone. I rather like the fact > that \d shows me the DEFAULT expression for a serial, because that > way I can easily see the name of the underlying sequence, in case > I want to do manual operations on the sequence. I didn't change the \d command in psql with this version of the patch for the very reasons you mentioned. > Do you want to attack it this way, or shall I? I'll see what I can come up with based on your comments to this message. All of that said, the corrections to the type definition regress tests should probably be corrected to account for domains anyway -- will submit as independent patch.
Rod Taylor <rbt@zort.ca> writes: > Hmm.. We'll need somekind of marker on the sequence to determine > whether its a serial or not when (if) dependencies will be able to reach > sequences within nextval (<seqname>.nextval support). Should the > is_serial marker remain on sequences (as applied in the patch)? No, it's not necessary and AFAICS it just offers another way to break existing clients that look at sequences. Notice I was recommending that you look for an internal-type dependency. The only way to create one that links a column and a sequence is a SERIAL declaration. A link that might someday be created because of the dependency expression would just be a normal dependency. > We still need a lot of of the hoop jumping in pg_dump to account for > PRIMARY KEY's on a serial column. The implicitly created unique index > at creation of a serial column doesn't allow ALTER TABLE ADD PRIMARY > KEY() to work nicely -- especially since they generally get the same > name. The "serial primary key" hack in analyze.c does make that a bit ugly, but one simple answer is not to try to dump as a serial if there's a primary key on the column. Now that I think about it, having pg_dump produce SERIAL declarations at all is a big performance loser -- you don't really want the index to exist while you are loading the table. A radical answer to that is to get rid of the implied UNIQUE constraint associated with SERIAL, thus making the combinations "serial unique" and "serial primary key" actually mean something. I fear this won't fly :-( but it might be worth proposing. If we have to maintain backwards compatibility, perhaps pg_dump could DROP the unique constraint just after making the table, and re-create it (possibly as a primary key rather than plain unique) when it makes indexes. regards, tom lane
> The "serial primary key" hack in analyze.c does make that a bit ugly, > but one simple answer is not to try to dump as a serial if there's a > primary key on the column. > A radical answer to that is to get rid of the implied UNIQUE constraint > associated with SERIAL, thus making the combinations "serial unique" and > "serial primary key" actually mean something. I fear this won't fly :-( My serials are almost always primary key's. Wouldn't change anything in that regard. Docs suggest that UNIQUE and NOT NULL be applied to the column anyway. http://www7.us.postgresql.org/users-lounge/docs/7.2/postgres/datatype.html#D I'll see what others do. > but it might be worth proposing. If we have to maintain backwards > compatibility, perhaps pg_dump could DROP the unique constraint just > after making the table, and re-create it (possibly as a primary key > rather than plain unique) when it makes indexes. This seems really ugly, but would do the trick.