Thread: Converting uuid primary key column to serial int

Converting uuid primary key column to serial int

From
Mike Christensen
Date:
Hi all -

For most of my database I use UUIDs as primary keys because, well, I
just like it better and like being able to generate a key in the
middle tier when I create new data.  However, I have one table that
has a very fixed and immutable set of data with a few thousand
ingredients in it.  This set very rarely changes (in fact when I
change it, I have to restart the whole website)..  The UUIDs for each
ingredient are kinda a hassle to manage though, and I'm thinking in
this case I might just want to use a 32bit integer for the primary
key..

Almost every other table in the database has FK relationships with
these ingredients, and there's all sorts of JOINs against it.  So,
making this change is gonna be fairly costly and require all sorts of
schema changes to everything.  I have a few related questions on this
topic..

1) What's the best way to convert an existing table from UUIDs to a
serial INT column?  I want to have Postgres just assign sequential
numbers starting at 1 to all the rows.

2) Once I do this, I obviously need to fix up all the foreign keys and
convert them from the old UUIDs to the new generated numeric value.
Is there an easy way to manage this sort of thing?  I'm thinking about
keeping the old UUID as a non-PK column to use as a "mapping" as I
migrate the rest of the schema over.  Is that a good approach?

3) I'm also worried about backup/restores/replication.  I often add
new ingredients to my production database, and every so often backup
this database and restore it to my dev box so I can work with
production data.  I need to make sure all these ingredient IDs stay
constant, just as the UUIDs did.  Does pg_dump generate SQL that will
"re-create" the existing keys and set the identity watermark
correctly?  If someday I have replication enabled, do the same IDs get
replicated consistantly?  Any other "gotchas" here?

Mike

Re: Converting uuid primary key column to serial int

From
"David Johnston"
Date:
> -----Original Message-----
> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
> owner@postgresql.org] On Behalf Of Mike Christensen
> Sent: Wednesday, June 08, 2011 2:57 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Converting uuid primary key column to serial int

> for each ingredient are kinda a hassle to manage though, and I'm thinking
in
> this case I might just want to use a 32bit integer for the primary key..

> 2) Once I do this, I obviously need to fix up all the foreign keys and
convert
> them from the old UUIDs to the new generated numeric value.
> Is there an easy way to manage this sort of thing?  I'm thinking about
keeping
> the old UUID as a non-PK column to use as a "mapping" as I migrate the
rest
> of the schema over.  Is that a good approach?
>

What specific hassles are you encountering?  Seems like a lot of effort to
go through to make the data less-unique.  Since you already have the field
setup for PK/FK use try to leave that alone and just add a secondary
identifier that you use for queries.  Like you said in #2 leave the UUID in
place as a non-PK (but still unique/not-null) field and leave all the
foreign keys in place as well.  Add your "integer identity" column as the
new Primary Key and use that field when you want to specify a record.  Then,
for those cases where you have or want to use the integer PK but the field
being compared is the UUID you can write a simple function to return the
UUID associated with the given integer.

Otherwise you probably should just add the integer field to EVERY TABLE and
establish relational links.  Add the field as null-able, perform the update
using the UUID, change to not null, add FK constraint.  As you then attempt
to remove the UUID field from the tables the system will tell you where
different function and views are relying upon the UUID field and you can
change the object to use the new integer field instead.

The main risk really is in the application since the database will not be
able to enforce consistency.

Serial is implemented via "DEFAULT" and sequences; and if you dump/restore a
sequence it is done consistently.

David J.










Re: Converting uuid primary key column to serial int

From
Mike Christensen
Date:
On Wed, Jun 8, 2011 at 1:06 PM, David Johnston <polobo@yahoo.com> wrote:
>> -----Original Message-----
>> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-
>> owner@postgresql.org] On Behalf Of Mike Christensen
>> Sent: Wednesday, June 08, 2011 2:57 PM
>> To: pgsql-general@postgresql.org
>> Subject: [GENERAL] Converting uuid primary key column to serial int
>
>> for each ingredient are kinda a hassle to manage though, and I'm thinking
> in
>> this case I might just want to use a 32bit integer for the primary key..
>
>> 2) Once I do this, I obviously need to fix up all the foreign keys and
> convert
>> them from the old UUIDs to the new generated numeric value.
>> Is there an easy way to manage this sort of thing?  I'm thinking about
> keeping
>> the old UUID as a non-PK column to use as a "mapping" as I migrate the
> rest
>> of the schema over.  Is that a good approach?
>>
>
> What specific hassles are you encountering?  Seems like a lot of effort to
> go through to make the data less-unique.  Since you already have the field
> setup for PK/FK use try to leave that alone and just add a secondary
> identifier that you use for queries.  Like you said in #2 leave the UUID in
> place as a non-PK (but still unique/not-null) field and leave all the
> foreign keys in place as well.  Add your "integer identity" column as the
> new Primary Key and use that field when you want to specify a record.  Then,
> for those cases where you have or want to use the integer PK but the field
> being compared is the UUID you can write a simple function to return the
> UUID associated with the given integer.
>
> Otherwise you probably should just add the integer field to EVERY TABLE and
> establish relational links.  Add the field as null-able, perform the update
> using the UUID, change to not null, add FK constraint.  As you then attempt
> to remove the UUID field from the tables the system will tell you where
> different function and views are relying upon the UUID field and you can
> change the object to use the new integer field instead.
>
> The main risk really is in the application since the database will not be
> able to enforce consistency.
>
> Serial is implemented via "DEFAULT" and sequences; and if you dump/restore a
> sequence it is done consistently.

All very good points.  A few reasons why I'm considering this:

1) On some pages, the user can pass in a search query (keywords to
find, ingredients to exclude, etc)..  I have a Base64 representation
of this search query, and it gets pretty long if a bunch of UUIDs are
included.  Making them ints would make for much shorter URLs.  There's
a few other pages that have an ingredient ID URL parameter directly.
I think URLs with UUIDs are kinda ugly.

2) I deal with a lot of this data through internal web based admin
tools as well as Excel.  There's a lot of places where I need to link
some metadata to an existing ingredient.  I'd like to just be able to
refer to "eggs" as, say, 53, rather than some UUID for eggs.  Sure,
the tools could have auto-complete and auto-lookup stuff but I haven't
had much time to work on these tools so they're super hacky right now.

Both of these requirements, as you said, could be satisfied by using a
non-primary key though.  In fact, there's then no reason to change any
other tables - internally the database would link to the ingredient
UUIDs.

I'm assuming I can still have a "Serial" column that is NOT a primary
key, and it'll incremement just the same as I add rows?  If that's the
case, I think that's a superior approach..

BTW, this table is too small to worry about disk space of UUIDs and/or
perhaps any sort of performance benefits to using int over uuid (if
there are any)..

Mike

Re: Converting uuid primary key column to serial int

From
"David Johnston"
Date:
> -----Original Message-----
> From: Mike Christensen [mailto:mike@kitchenpc.com]
> Sent: Wednesday, June 08, 2011 4:26 PM
> To: David Johnston; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Converting uuid primary key column to serial int


> I'm assuming I can still have a "Serial" column that is NOT a primary key,
and
> it'll incremement just the same as I add rows?  If that's the case, I
think that's
> a superior approach..
>
> BTW, this table is too small to worry about disk space of UUIDs and/or
> perhaps any sort of performance benefits to using int over uuid (if there
are
> any)..
>
> Mike

" CREATE TABLE t ( field serial ); " is simply short-hand for " CREATE TABLE
t (field integer DEFAULT nextval('t_seq') ); " where the sequence "t_seq" is
automatically created and linked to the table.

Nothing more and nothing less.  Whether you add NOT NULL or, by extension,
PRIMARY KEY, to the field as well as the "serial" datatype depends on
whether you want to have those other properties.

Have you considered giving the row for "eggs" the PK of "eggs"?  You did say
you have multiple thousands of records but neither the UUID or the integer
is going to stop you from then having 2+ records with "eggs" as the name.
If you want to see how many recipes use "eggs" what would you do to make
sure you are not missing any?  Even if you decide to keep the UUID and/or
Integer as UNIQUE indices you should try and have something in the data
itself that can be defined as UNIQUE.  Since you are dealing with discreet
items, without any kind of time property, it should be possible to do so.

From an implementation perspective you will want to create the sequence and
all using "serial" but allow "NULL" for the field.  Once you've assigned all
the existing records an ID (probably via the row() window function) you can
setup the sequence to begin with the next available number.  See docs for
syntax.

David J.



Re: Converting uuid primary key column to serial int

From
Mike Christensen
Date:
>> I'm assuming I can still have a "Serial" column that is NOT a primary key,
> and
>> it'll incremement just the same as I add rows?  If that's the case, I
> think that's
>> a superior approach..
>>
>> BTW, this table is too small to worry about disk space of UUIDs and/or
>> perhaps any sort of performance benefits to using int over uuid (if there
> are
>> any)..
>>
>> Mike
>
> " CREATE TABLE t ( field serial ); " is simply short-hand for " CREATE TABLE
> t (field integer DEFAULT nextval('t_seq') ); " where the sequence "t_seq" is
> automatically created and linked to the table.
>
> Nothing more and nothing less.  Whether you add NOT NULL or, by extension,
> PRIMARY KEY, to the field as well as the "serial" datatype depends on
> whether you want to have those other properties.
>
> Have you considered giving the row for "eggs" the PK of "eggs"?  You did say
> you have multiple thousands of records but neither the UUID or the integer
> is going to stop you from then having 2+ records with "eggs" as the name.
> If you want to see how many recipes use "eggs" what would you do to make
> sure you are not missing any?  Even if you decide to keep the UUID and/or
> Integer as UNIQUE indices you should try and have something in the data
> itself that can be defined as UNIQUE.  Since you are dealing with discreet
> items, without any kind of time property, it should be possible to do so.
>
> From an implementation perspective you will want to create the sequence and
> all using "serial" but allow "NULL" for the field.  Once you've assigned all
> the existing records an ID (probably via the row() window function) you can
> setup the sequence to begin with the next available number.  See docs for
> syntax.

Yea, there's already a unique index on the ingredient name, so there
can only be 1 "eggs" row..  I'm not too sure on using the name as the
primary key..

The serial implementation makes sense..  If that's all that it does, I
should be able to just update all rows and set the numeric value to
nextval('t_seq') directly to order all my existing rows, then set the
column to NOT NULL when I'm done, and set the sequence to the next
available number..  Worst case I'll just write a little function that
loops through my rows and numbers them all.  I'll checkout the docs on
this, thanks!

Mike

Re: Converting uuid primary key column to serial int

From
Adrian Klaver
Date:

On Wednesday, June 08, 2011 1:40:35 pm David Johnston wrote:

> > -----Original Message-----

> > From: Mike Christensen [mailto:mike@kitchenpc.com]

> > Sent: Wednesday, June 08, 2011 4:26 PM

> > To: David Johnston; pgsql-general@postgresql.org

> > Subject: Re: [GENERAL] Converting uuid primary key column to serial int

> >

> >

> > I'm assuming I can still have a "Serial" column that is NOT a primary

> > key,

>

> and

>

> > it'll incremement just the same as I add rows? If that's the case, I

>

> think that's

>

> > a superior approach..

> >

> > BTW, this table is too small to worry about disk space of UUIDs and/or

> > perhaps any sort of performance benefits to using int over uuid (if there

>

> are

>

> > any)..

> >

> > Mike

>

> " CREATE TABLE t ( field serial ); " is simply short-hand for " CREATE

> TABLE t (field integer DEFAULT nextval('t_seq') ); " where the sequence

> "t_seq" is automatically created and linked to the table.

>

Actually per the docs and for completeness sake:

http://www.postgresql.org/docs/9.0/interactive/datatype-numeric.html#DATATYPE-SERIAL

CREATE SEQUENCE tablename_colname_seq;

CREATE TABLE tablename (
    colname integer NOT NULL DEFAULT nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;
So the NOT NULL is already included.

>

> David J.

--

Adrian Klaver

adrian.klaver@gmail.com