Thread: Converting uuid primary key column to serial int
Hi all - For most of my database I use UUIDs as primary keys because, well, I just like it better and like being able to generate a key in the middle tier when I create new data. However, I have one table that has a very fixed and immutable set of data with a few thousand ingredients in it. This set very rarely changes (in fact when I change it, I have to restart the whole website).. The UUIDs for each ingredient are kinda a hassle to manage though, and I'm thinking in this case I might just want to use a 32bit integer for the primary key.. Almost every other table in the database has FK relationships with these ingredients, and there's all sorts of JOINs against it. So, making this change is gonna be fairly costly and require all sorts of schema changes to everything. I have a few related questions on this topic.. 1) What's the best way to convert an existing table from UUIDs to a serial INT column? I want to have Postgres just assign sequential numbers starting at 1 to all the rows. 2) Once I do this, I obviously need to fix up all the foreign keys and convert them from the old UUIDs to the new generated numeric value. Is there an easy way to manage this sort of thing? I'm thinking about keeping the old UUID as a non-PK column to use as a "mapping" as I migrate the rest of the schema over. Is that a good approach? 3) I'm also worried about backup/restores/replication. I often add new ingredients to my production database, and every so often backup this database and restore it to my dev box so I can work with production data. I need to make sure all these ingredient IDs stay constant, just as the UUIDs did. Does pg_dump generate SQL that will "re-create" the existing keys and set the identity watermark correctly? If someday I have replication enabled, do the same IDs get replicated consistantly? Any other "gotchas" here? Mike
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Mike Christensen > Sent: Wednesday, June 08, 2011 2:57 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] Converting uuid primary key column to serial int > for each ingredient are kinda a hassle to manage though, and I'm thinking in > this case I might just want to use a 32bit integer for the primary key.. > 2) Once I do this, I obviously need to fix up all the foreign keys and convert > them from the old UUIDs to the new generated numeric value. > Is there an easy way to manage this sort of thing? I'm thinking about keeping > the old UUID as a non-PK column to use as a "mapping" as I migrate the rest > of the schema over. Is that a good approach? > What specific hassles are you encountering? Seems like a lot of effort to go through to make the data less-unique. Since you already have the field setup for PK/FK use try to leave that alone and just add a secondary identifier that you use for queries. Like you said in #2 leave the UUID in place as a non-PK (but still unique/not-null) field and leave all the foreign keys in place as well. Add your "integer identity" column as the new Primary Key and use that field when you want to specify a record. Then, for those cases where you have or want to use the integer PK but the field being compared is the UUID you can write a simple function to return the UUID associated with the given integer. Otherwise you probably should just add the integer field to EVERY TABLE and establish relational links. Add the field as null-able, perform the update using the UUID, change to not null, add FK constraint. As you then attempt to remove the UUID field from the tables the system will tell you where different function and views are relying upon the UUID field and you can change the object to use the new integer field instead. The main risk really is in the application since the database will not be able to enforce consistency. Serial is implemented via "DEFAULT" and sequences; and if you dump/restore a sequence it is done consistently. David J.
On Wed, Jun 8, 2011 at 1:06 PM, David Johnston <polobo@yahoo.com> wrote: >> -----Original Message----- >> From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- >> owner@postgresql.org] On Behalf Of Mike Christensen >> Sent: Wednesday, June 08, 2011 2:57 PM >> To: pgsql-general@postgresql.org >> Subject: [GENERAL] Converting uuid primary key column to serial int > >> for each ingredient are kinda a hassle to manage though, and I'm thinking > in >> this case I might just want to use a 32bit integer for the primary key.. > >> 2) Once I do this, I obviously need to fix up all the foreign keys and > convert >> them from the old UUIDs to the new generated numeric value. >> Is there an easy way to manage this sort of thing? I'm thinking about > keeping >> the old UUID as a non-PK column to use as a "mapping" as I migrate the > rest >> of the schema over. Is that a good approach? >> > > What specific hassles are you encountering? Seems like a lot of effort to > go through to make the data less-unique. Since you already have the field > setup for PK/FK use try to leave that alone and just add a secondary > identifier that you use for queries. Like you said in #2 leave the UUID in > place as a non-PK (but still unique/not-null) field and leave all the > foreign keys in place as well. Add your "integer identity" column as the > new Primary Key and use that field when you want to specify a record. Then, > for those cases where you have or want to use the integer PK but the field > being compared is the UUID you can write a simple function to return the > UUID associated with the given integer. > > Otherwise you probably should just add the integer field to EVERY TABLE and > establish relational links. Add the field as null-able, perform the update > using the UUID, change to not null, add FK constraint. As you then attempt > to remove the UUID field from the tables the system will tell you where > different function and views are relying upon the UUID field and you can > change the object to use the new integer field instead. > > The main risk really is in the application since the database will not be > able to enforce consistency. > > Serial is implemented via "DEFAULT" and sequences; and if you dump/restore a > sequence it is done consistently. All very good points. A few reasons why I'm considering this: 1) On some pages, the user can pass in a search query (keywords to find, ingredients to exclude, etc).. I have a Base64 representation of this search query, and it gets pretty long if a bunch of UUIDs are included. Making them ints would make for much shorter URLs. There's a few other pages that have an ingredient ID URL parameter directly. I think URLs with UUIDs are kinda ugly. 2) I deal with a lot of this data through internal web based admin tools as well as Excel. There's a lot of places where I need to link some metadata to an existing ingredient. I'd like to just be able to refer to "eggs" as, say, 53, rather than some UUID for eggs. Sure, the tools could have auto-complete and auto-lookup stuff but I haven't had much time to work on these tools so they're super hacky right now. Both of these requirements, as you said, could be satisfied by using a non-primary key though. In fact, there's then no reason to change any other tables - internally the database would link to the ingredient UUIDs. I'm assuming I can still have a "Serial" column that is NOT a primary key, and it'll incremement just the same as I add rows? If that's the case, I think that's a superior approach.. BTW, this table is too small to worry about disk space of UUIDs and/or perhaps any sort of performance benefits to using int over uuid (if there are any).. Mike
> -----Original Message----- > From: Mike Christensen [mailto:mike@kitchenpc.com] > Sent: Wednesday, June 08, 2011 4:26 PM > To: David Johnston; pgsql-general@postgresql.org > Subject: Re: [GENERAL] Converting uuid primary key column to serial int > I'm assuming I can still have a "Serial" column that is NOT a primary key, and > it'll incremement just the same as I add rows? If that's the case, I think that's > a superior approach.. > > BTW, this table is too small to worry about disk space of UUIDs and/or > perhaps any sort of performance benefits to using int over uuid (if there are > any).. > > Mike " CREATE TABLE t ( field serial ); " is simply short-hand for " CREATE TABLE t (field integer DEFAULT nextval('t_seq') ); " where the sequence "t_seq" is automatically created and linked to the table. Nothing more and nothing less. Whether you add NOT NULL or, by extension, PRIMARY KEY, to the field as well as the "serial" datatype depends on whether you want to have those other properties. Have you considered giving the row for "eggs" the PK of "eggs"? You did say you have multiple thousands of records but neither the UUID or the integer is going to stop you from then having 2+ records with "eggs" as the name. If you want to see how many recipes use "eggs" what would you do to make sure you are not missing any? Even if you decide to keep the UUID and/or Integer as UNIQUE indices you should try and have something in the data itself that can be defined as UNIQUE. Since you are dealing with discreet items, without any kind of time property, it should be possible to do so. From an implementation perspective you will want to create the sequence and all using "serial" but allow "NULL" for the field. Once you've assigned all the existing records an ID (probably via the row() window function) you can setup the sequence to begin with the next available number. See docs for syntax. David J.
>> I'm assuming I can still have a "Serial" column that is NOT a primary key, > and >> it'll incremement just the same as I add rows? If that's the case, I > think that's >> a superior approach.. >> >> BTW, this table is too small to worry about disk space of UUIDs and/or >> perhaps any sort of performance benefits to using int over uuid (if there > are >> any).. >> >> Mike > > " CREATE TABLE t ( field serial ); " is simply short-hand for " CREATE TABLE > t (field integer DEFAULT nextval('t_seq') ); " where the sequence "t_seq" is > automatically created and linked to the table. > > Nothing more and nothing less. Whether you add NOT NULL or, by extension, > PRIMARY KEY, to the field as well as the "serial" datatype depends on > whether you want to have those other properties. > > Have you considered giving the row for "eggs" the PK of "eggs"? You did say > you have multiple thousands of records but neither the UUID or the integer > is going to stop you from then having 2+ records with "eggs" as the name. > If you want to see how many recipes use "eggs" what would you do to make > sure you are not missing any? Even if you decide to keep the UUID and/or > Integer as UNIQUE indices you should try and have something in the data > itself that can be defined as UNIQUE. Since you are dealing with discreet > items, without any kind of time property, it should be possible to do so. > > From an implementation perspective you will want to create the sequence and > all using "serial" but allow "NULL" for the field. Once you've assigned all > the existing records an ID (probably via the row() window function) you can > setup the sequence to begin with the next available number. See docs for > syntax. Yea, there's already a unique index on the ingredient name, so there can only be 1 "eggs" row.. I'm not too sure on using the name as the primary key.. The serial implementation makes sense.. If that's all that it does, I should be able to just update all rows and set the numeric value to nextval('t_seq') directly to order all my existing rows, then set the column to NOT NULL when I'm done, and set the sequence to the next available number.. Worst case I'll just write a little function that loops through my rows and numbers them all. I'll checkout the docs on this, thanks! Mike
On Wednesday, June 08, 2011 1:40:35 pm David Johnston wrote:
> > -----Original Message-----
> > From: Mike Christensen [mailto:mike@kitchenpc.com]
> > Sent: Wednesday, June 08, 2011 4:26 PM
> > To: David Johnston; pgsql-general@postgresql.org
> > Subject: Re: [GENERAL] Converting uuid primary key column to serial int
> >
> >
> > I'm assuming I can still have a "Serial" column that is NOT a primary
> > key,
>
> and
>
> > it'll incremement just the same as I add rows? If that's the case, I
>
> think that's
>
> > a superior approach..
> >
> > BTW, this table is too small to worry about disk space of UUIDs and/or
> > perhaps any sort of performance benefits to using int over uuid (if there
>
> are
>
> > any)..
> >
> > Mike
>
> " CREATE TABLE t ( field serial ); " is simply short-hand for " CREATE
> TABLE t (field integer DEFAULT nextval('t_seq') ); " where the sequence
> "t_seq" is automatically created and linked to the table.
>
Actually per the docs and for completeness sake:
http://www.postgresql.org/docs/9.0/interactive/datatype-numeric.html#DATATYPE-SERIAL
CREATE SEQUENCE tablename_colname_seq;
CREATE TABLE tablename (
colname integer NOT NULL DEFAULT nextval('tablename_colname_seq')
);
ALTER SEQUENCE tablename_colname_seq OWNED BY tablename.colname;
So the NOT NULL is already included.
>
> David J.
--
Adrian Klaver
adrian.klaver@gmail.com