Thread: Dump serials as serial -- not a sequence

Dump serials as serial -- not a sequence

From
Rod Taylor
Date:
- Fix an issue indexing domains.  Ensure we look for the operator class
of the base type, not the domain type.
- Add an is_serial boolean attribute to sequence tables.
- Add serial4 and serial8 domains with respective serial and bigserial
aliases (similar to int4 and int8 to integer and bigint)
- Jump through hoops to deal with a primary key'd serial column, alter
table add primary key doesn't work for these.
- Fix type sanity checks to account for domains.

End result, pg_dump no longer uses sequences for serials and psql shows
serials rather than integer columns.

Attachment

Re: Dump serials as serial -- not a sequence

From
Bruce Momjian
Date:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://candle.pha.pa.us/cgi-bin/pgpatches

I will try to apply it within the next 48 hours.

---------------------------------------------------------------------------


Rod Taylor wrote:
> - Fix an issue indexing domains.  Ensure we look for the operator class
> of the base type, not the domain type.
> - Add an is_serial boolean attribute to sequence tables.
> - Add serial4 and serial8 domains with respective serial and bigserial
> aliases (similar to int4 and int8 to integer and bigint)
> - Jump through hoops to deal with a primary key'd serial column, alter
> table add primary key doesn't work for these.
> - Fix type sanity checks to account for domains.
>
> End result, pg_dump no longer uses sequences for serials and psql shows
> serials rather than integer columns.

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Dump serials as serial -- not a sequence

From
Tom Lane
Date:
Rod Taylor <rbt@zort.ca> writes:
> - Add an is_serial boolean attribute to sequence tables.
> - Add serial4 and serial8 domains with respective serial and bigserial
> aliases (similar to int4 and int8 to integer and bigint)
> - Jump through hoops to deal with a primary key'd serial column, alter
> table add primary key doesn't work for these.
> - Fix type sanity checks to account for domains.
> End result, pg_dump no longer uses sequences for serials and psql shows
> serials rather than integer columns.

I think we have to reject this patch.  The killer problem with turning
serial columns into a separate serial domain type is that it will break
existing clients, which are expecting the column type to look like plain
int4 (or int8 if they are new enough to know about bigserial).  As an
example, the ODBC driver will fail to recognize "serial4" as a numeric
datatype at all; I suspect the same is true of JDBC.

We could run around and try to fix the clients --- indeed, I suspect
the ODBC/JDBC drivers will need some kind of generic way of dealing
with domain types.  (Anyone for exposing getBaseType as a SQL function?)
However, the lack of domain support in old clients doesn't break those
clients as long as they're dealing with existing schemas, which haven't
got domains ... unless we try to change serial into a domain.  Then we'd
definitely be creating a compatibility issue for existing applications.

The bottom line is that I do not think it's worth the trouble to change
serial into a domain, because the entire motivation for this was to make
pg_dump simpler, and a look at your patch shows that it's made pg_dump's
life harder not easier.

The reason this isn't good for pg_dump is that inherited serial columns
don't work nicely for pg_dump if it tries to use the column datatype
to track them.  What pg_dump wants is to consider only the parent
table to have a true "serial" column, while treating child tables as
plain int columns with a default clause, same as it ever was.  But
expressing serial-ness as a datatype causes it to inherit, and then
you have to jump through hoops to stop the child tables from being
treated as having serial columns.

I think the right way to attack this is to leave the backend code and
database representation for serials alone.  pg_dump should be taught to
recognize a serial-column association on the basis of there being an
internal-type dependency between an int column and a sequence.  Since
the dependency only exists between the sequence and the original parent
table, there's no problem with doing the wrong thing for child tables.
(Since int columns are probably more common than sequence objects,
I'd be inclined to do the join against pg_depend only when looking at
a sequence object, and if we get a match then to look through pg_dump's
internal list of tables to find and mark the corresponding column.)

psql's \d command could also be made to recognize serials the same
way, though I am inclined to leave it alone.  I rather like the fact
that \d shows me the DEFAULT expression for a serial, because that
way I can easily see the name of the underlying sequence, in case
I want to do manual operations on the sequence.

Do you want to attack it this way, or shall I?

            regards, tom lane

Re: Dump serials as serial -- not a sequence

From
Rod Taylor
Date:
On Sat, 2002-08-17 at 14:16, Tom Lane wrote:
> Rod Taylor <rbt@zort.ca> writes:
> > - Add an is_serial boolean attribute to sequence tables.
> > - Add serial4 and serial8 domains with respective serial and bigserial
> > aliases (similar to int4 and int8 to integer and bigint)
> > - Jump through hoops to deal with a primary key'd serial column, alter
> > table add primary key doesn't work for these.
> > - Fix type sanity checks to account for domains.
> > End result, pg_dump no longer uses sequences for serials and psql shows
> > serials rather than integer columns.

> The bottom line is that I do not think it's worth the trouble to change
> serial into a domain, because the entire motivation for this was to make
> pg_dump simpler, and a look at your patch shows that it's made pg_dump's
> life harder not easier.

I was also trying to make it easier for frontends to determine an
auto-incrementing column from one with a simple default. Right now the
test is a default with nextval() in it.  However, I could also
accommodate that with a simple pg_is_serial(table oid, attnum) style
function -- pg_dump can use it too I suppose.

> I think the right way to attack this is to leave the backend code and
> database representation for serials alone.  pg_dump should be taught to
> recognize a serial-column association on the basis of there being an
> internal-type dependency between an int column and a sequence.  Since
> the dependency only exists between the sequence and the original parent
> table, there's no problem with doing the wrong thing for child tables.
> (Since int columns are probably more common than sequence objects,
> I'd be inclined to do the join against pg_depend only when looking at
> a sequence object, and if we get a match then to look through pg_dump's
> internal list of tables to find and mark the corresponding column.)

Hmm..  We'll need somekind of marker on the sequence to determine
whether its a serial or not when (if) dependencies will be able to reach
sequences within nextval (<seqname>.nextval support).  Should the
is_serial marker remain on sequences (as applied in the patch)?

We still need a lot of of the hoop jumping in pg_dump to account for
PRIMARY KEY's on a serial column.  The implicitly created unique index
at creation of a serial column doesn't allow ALTER TABLE ADD PRIMARY
KEY() to work nicely -- especially since they generally get the same
name.  The hoops pushed PRIMARY KEY definition back into the CREATE
TABLE statement when required.

> psql's \d command could also be made to recognize serials the same
> way, though I am inclined to leave it alone.  I rather like the fact
> that \d shows me the DEFAULT expression for a serial, because that
> way I can easily see the name of the underlying sequence, in case
> I want to do manual operations on the sequence.

I didn't change the \d command in psql with this version of the patch
for the very reasons you mentioned.

> Do you want to attack it this way, or shall I?

I'll see what I can come up with based on your comments to this message.

All of that said, the corrections to the type definition regress tests
should probably be corrected to account for domains anyway -- will
submit as independent patch.




Re: Dump serials as serial -- not a sequence

From
Tom Lane
Date:
Rod Taylor <rbt@zort.ca> writes:
> Hmm..  We'll need somekind of marker on the sequence to determine
> whether its a serial or not when (if) dependencies will be able to reach
> sequences within nextval (<seqname>.nextval support).  Should the
> is_serial marker remain on sequences (as applied in the patch)?

No, it's not necessary and AFAICS it just offers another way to break
existing clients that look at sequences.

Notice I was recommending that you look for an internal-type dependency.
The only way to create one that links a column and a sequence is a
SERIAL declaration.  A link that might someday be created because of
the dependency expression would just be a normal dependency.

> We still need a lot of of the hoop jumping in pg_dump to account for
> PRIMARY KEY's on a serial column.  The implicitly created unique index
> at creation of a serial column doesn't allow ALTER TABLE ADD PRIMARY
> KEY() to work nicely -- especially since they generally get the same
> name.

The "serial primary key" hack in analyze.c does make that a bit ugly,
but one simple answer is not to try to dump as a serial if there's a
primary key on the column.

Now that I think about it, having pg_dump produce SERIAL declarations
at all is a big performance loser -- you don't really want the index
to exist while you are loading the table.

A radical answer to that is to get rid of the implied UNIQUE constraint
associated with SERIAL, thus making the combinations "serial unique" and
"serial primary key" actually mean something.  I fear this won't fly :-(
but it might be worth proposing.  If we have to maintain backwards
compatibility, perhaps pg_dump could DROP the unique constraint just
after making the table, and re-create it (possibly as a primary key
rather than plain unique) when it makes indexes.

            regards, tom lane

Re: Dump serials as serial -- not a sequence

From
Rod Taylor
Date:
> The "serial primary key" hack in analyze.c does make that a bit ugly,
> but one simple answer is not to try to dump as a serial if there's a
> primary key on the column.



> A radical answer to that is to get rid of the implied UNIQUE constraint
> associated with SERIAL, thus making the combinations "serial unique" and
> "serial primary key" actually mean something.  I fear this won't fly :-(

My serials are almost always primary key's.  Wouldn't change anything in
that regard.

Docs suggest that UNIQUE and NOT NULL be applied to the column anyway.
http://www7.us.postgresql.org/users-lounge/docs/7.2/postgres/datatype.html#D

I'll see what others do.

> but it might be worth proposing.  If we have to maintain backwards
> compatibility, perhaps pg_dump could DROP the unique constraint just
> after making the table, and re-create it (possibly as a primary key
> rather than plain unique) when it makes indexes.

This seems really ugly, but would do the trick.