Thread: Using a postgres table to maintain unique id?
Hi, I have a distributed application that needs unique 64-bit "id" values. The current implementation keeps the current value for this id in a file on one system and uses a process to provide access to this file via CORBA. However, I also use Postgres (7.0.2) throughout this application and it seems cleaner to me to keep the current id value in a table and just use postgres to provide access (with a trigger function to increment the id on access). Is this reasonable? Is it fast? (I need 10 or more IDs generated each second.) Can I avoid having the table gradually fill with "old" rows for this entry, and this avoid the need to run VACUUM ANALYZE periodically? Any tips on how to implement the trigger would be appreciated. (Is it possible to generate an int8 sequence value?) Thanks! -- Steve Wampler- SOLIS Project, National Solar Observatory swampler@noao.edu
>However, I also use Postgres (7.0.2) throughout this >application and it seems cleaner to me to keep the current >id value in a table and just use postgres to provide access >(with a trigger function to increment the id on access). Why not a sequence? >Is this reasonable? Is it fast? (I need 10 or more IDs >generated each second.) Can I avoid having the table >gradually fill with "old" rows for this entry, and this >avoid the need to run VACUUM ANALYZE periodically? The only problem I have had with this type of thing is when a number gets deleted, it does not get recycled. Joshua Drake > >Any tips on how to implement the trigger would be >appreciated. (Is it possible to generate an int8 sequence >value?) > >Thanks! >-- >Steve Wampler- SOLIS Project, National Solar Observatory >swampler@noao.edu > -- -- <COMPANY>CommandPrompt - http://www.commandprompt.com </COMPANY> <PROJECT>OpenDocs, LLC. - http://www.opendocs.org </PROJECT> <PROJECT>LinuxPorts - http://www.linuxports.com </PROJECT> <WEBMASTER>LDP - http://www.linuxdoc.org </WEBMASTER> -- Instead of asking why a piece of software is using "1970s technology," start asking why software is ignoring 30 years of accumulated wisdom. --
Poet/Joshua Drake wrote: > > >However, I also use Postgres (7.0.2) throughout this > >application and it seems cleaner to me to keep the current > >id value in a table and just use postgres to provide access > >(with a trigger function to increment the id on access). > > Why not a sequence? Can someone show me how to create (and use) an int8 sequence? > >Is this reasonable? Is it fast? (I need 10 or more IDs > >generated each second.) Can I avoid having the table > >gradually fill with "old" rows for this entry, and this > >avoid the need to run VACUUM ANALYZE periodically? > > The only problem I have had with this type of thing is when a number gets > deleted, it does not get recycled. Fortunately, I don't want any number to ever get recycled - the id needs to be unique throughout the 25+ year lifetime of the project. The table would have a single row with a single column. Selecting that table cell would return the current value, but leave the value incremented in the table cell (it's ok if it increments the value before returning). -- Steve Wampler- SOLIS Project, National Solar Observatory swampler@noao.edu
On Mon, 13 Nov 2000 10:44:21 -0700, Steve Wampler said: > Poet/Joshua Drake wrote: > > > > >However, I also use Postgres (7.0.2) throughout this > > >application and it seems cleaner to me to keep the current > > >id value in a table and just use postgres to provide access > > >(with a trigger function to increment the id on access). > > > > Why not a sequence? > > Can someone show me how to create (and use) an int8 sequence? > create sequnece seq_name ; there are additional options start sql and type drfrog=# \h create sequence Command: CREATE SEQUENCE Description: Creates a new sequence number generator Syntax: CREATE SEQUENCE seqname [ INCREMENT increment ] [ MINVALUE minvalue ] [ MAXVALUE maxvalue ] [ START start ] [ CACHE cache] [ CYCLE ] drfrog=# more info in the docs too! > > >Is this reasonable? Is it fast? (I need 10 or more IDs > > >generated each second.) Can I avoid having the table > > >gradually fill with "old" rows for this entry, and this > > >avoid the need to run VACUUM ANALYZE periodically? > > > > The only problem I have had with this type of thing is when a number gets > > deleted, it does not get recycled. > > Fortunately, I don't want any number to ever get recycled - the id needs to > be unique throughout the 25+ year lifetime of the project. The table > would have a single row with a single column. Selecting that table cell > would return the current value, but leave the value incremented in the > table cell (it's ok if it increments the value before returning). > > > -- > Steve Wampler- SOLIS Project, National Solar Observatory > swampler@noao.edu > >
On 13 Nov 2000 10:30:55 PST, Dr Frog wrote: >create sequnece seq_name ; > >there are additional options >start sql and type What are the "SQL" and "TYPE" options? I don't see reference to them in the docs. Can I use the TYPE option to create a sequence that's based in int8 (64 bit integer) instead of int4? That's exactly what I've been wanting! What version of postgres supports this?
Marten Feldtmann wrote: > > Throw away all the "hardwired"-stuff and do it with software. I > once described an algorithm in one of this lists how to create > unique values for clients without minimum interaction with the > database. > > The result: query once in the beginning of your application, > generate your id's "offline" at the maximum speed you may > have and store your last generated id when your client > finished. Superior to all the "hardwired"-database solutions ! Yes, but... (1) The application I have is composed of about 50 processes running on 3 different OS/architectures (Linux/intel, Solaris/sparc,and VxWorks/ppc). The IDs I need must be unique across all processes (I suppose one solution would beto provide each ID with a unique prefix based on the process that is running, but...) (2) Some of these systems are real-time boxes that might get rebooted at any moment, or might hang for hardware-related reasons [I'd like to able to say that all of the processes could detect imminent failure, but unfortunately,I can't]. So determining when a client "finishes" is not always possible, which prevents (he claims) theabove solution from claiming ID uniqueness. However, it might be sufficient to provide a process on the postgres DB machine (if *that* machine dies, *everything* stops...) that serves IDs via CORBA to all the other applications and (internally) uses the "software" approach given above. This process could "sync" with the database every N seconds or so (where N might be < 1.0). This, while still not guaranteeing uniqueness, would at least come pretty close... It would still be nice to avoid having to VACUUM ANALYZE this table, though, and it "feels" as though it is duplicating functionality already provided by postgres DB backends. I'll think about this solution - thanks! -- Steve Wampler- SOLIS Project, National Solar Observatory swampler@noao.edu
> Can someone show me how to create (and use) an int8 > sequence? From what I can tell (both from docs and doing a describe on sequences in my database), a postgresql sequence is an int4, not an int8, and thus you are limited to a max of 2.1 billion values. If you require an int8 sequence, you'll probably have to manage your own and just use an int8 column. > Fortunately, I don't want any number to ever get > recycled - the id needs to > be unique throughout the 25+ year lifetime of the > project. The table > would have a single row with a single column. > Selecting that table cell > would return the current value, but leave the value > incremented in the > table cell (it's ok if it increments the value > before returning). If 2.1 billion unique values is enough, then a sequence will work fine. Performance of sequence should be ok. An alternative you might consider is creating your own "sequence server" external to the database. Michael __________________________________________________ Do You Yahoo!? Yahoo! Calendar - Get organized for the holidays! http://calendar.yahoo.com/
At 11/13/2000 06:22 PM -0800, Michael Teter wrote:<br /><blockquote cite="cite" class="cite" type="cite">> Can someoneshow me how to create (and use) an int8<br /> > sequence?<br /><br /> From what I can tell (both from docs anddoing a<br /> describe on sequences in my database), a postgresql<br /> sequence is an int4, not an int8, and thus youare<br /> limited to a max of 2.1 billion values.<br /><br /> If you require an int8 sequence, you'll probably have<br/> to manage your own and just use an int8 column.<br /></blockquote><p> I had originally started using int8 and creatingcustom sequences. However, as mentioned in a previous post, there is an inherent performance penalty in using int8over int4. Tom Lane advised me that the int8 routines are an emulated or synthesized data type. in the test I didon our 7.0.2 server I notice about a 25-30% decrease in performance when using complex joins on tables containing referentialkeys, primary keys all in the int8 data type.<br /><br /> This might be something to think about as well.<br /><br/><br /><br />
Thomas Swan wrote: > > At 11/13/2000 06:22 PM -0800, Michael Teter wrote: > > > > From what I can tell (both from docs and doing a > > describe on sequences in my database), a postgresql > > sequence is an int4, not an int8, and thus you are > > limited to a max of 2.1 billion values. > > > > If you require an int8 sequence, you'll probably have > > to manage your own and just use an int8 column. > > > I had originally started using int8 and creating custom sequences. However, > as mentioned in a previous post, there is an inherent performance penalty in > using int8 over int4. Tom Lane advised me that the int8 routines are an > emulated or synthesized data type. in the test I did on our 7.0.2 server I > notice about a 25-30% decrease in performance when using complex joins on > tables containing referential keys, primary keys all in the int8 data type. > > This might be something to think about as well. Thanks. Because of these and other comments people have made, I've gone back to using a flat_file-with-server approach instead of adding a table to my postgres DB. While an int4 *might* work, it doesn't handle the "worst-case" scenario (which is up around 15 billion values). Thanks to everyone for your comments and suggestions! -- Steve Wampler- SOLIS Project, National Solar Observatory swampler@noao.edu
Re: Using a postgres table to maintain unique id?
From
M.Feldtmann@t-online.de (Marten Feldtmann)
Date:
Steve Wampler schrieb: > > Poet/Joshua Drake wrote: > ? > ? ?However, I also use Postgres (7.0.2) throughout this > ? ?application and it seems cleaner to me to keep the current > ? ?id value in a table and just use postgres to provide access > ? ?(with a trigger function to increment the id on access). > ? > ? Why not a sequence? > > Can someone show me how to create (and use) an int8 sequence? > > ? ?Is this reasonable? Is it fast? (I need 10 or more IDs > ? ?generated each second.) Can I avoid having the table > ? ?gradually fill with "old" rows for this entry, and this > ? ?avoid the need to run VACUUM ANALYZE periodically? Throw away all the "hardwired"-stuff and do it with software. I once described an algorithm in one of this lists how to create unique values for clients without minimum interaction with the database. The result: query once in the beginning of your application, generate your id's "offline" at the maximum speed you may have and store your last generated id when your client finished. Superior to all the "hardwired"-database solutions ! Marten
Re: Using a postgres table to maintain unique id?
From
M.Feldtmann@t-online.de (Marten Feldtmann)
Date:
Steve Wampler schrieb: > > Yes, but... > > (1) The application I have is composed of about 50 processes > running on 3 different OS/architectures (Linux/intel, > Solaris/sparc, and VxWorks/ppc). The IDs I need must be > unique across all processes (I suppose one solution would > be to provide each ID with a unique prefix based on the > process that is running, but...) We've build a document management system using this system and the clients all created ids are based on the a kind of high-low algorithm to create unique indices. The indices are unique among all possible clients ... the number of clients does not matter. As I said before: better than any hardwired solution. You have two database queries among the normal lifetime of a client to get the base information to create unique clients .. during the lifetime the ids are created offline and they are garanteed to be unique. Actually we're now in the process to build an object-oriented PPD system and we use the same algorithm again. > > (2) Some of these systems are real-time boxes that might get > rebooted at any moment, or might hang for hardware-related > reasons [I'd like to able to say that all of the processes > could detect imminent failure, but unfortunately, I can't]. > So determining when a client "finishes" is not always possible, > which prevents (he claims) the above solution from claiming > ID uniqueness. > It does not matter until your machines do not reboot every second but even then you may get along for ten or 20 years before you ran out of indices. > (where N might be < 1.0). This, while still not guaranteeing > uniqueness, would at least come pretty close... It would still be > nice to avoid having to VACUUM ANALYZE this table, though, and it The base idea for all of it is simple: The unique id is based on three integer numbers: a) id-1 is a class id number (16 bit ?)b) id-2 is a global-session-number (32 bit): nc) id-3 is a local-session-number (32bit): x The id-3, id-2 and id-1 are converted to the base 36 and by this they are converted to strings. The result unique id is about 15 characters long. (6+6+3) We need a table to hold pairs of "global-id, local-id", this table is initially empty. When a client starts, it connects to the database, lockes this table and now the following happens: a) if the table is empty, the client uses (1,0) for its own and stores (2,0) for the next client into the table. b) if the table has ONE entry, the client removes the pair (n,x) from the table and stores (n+1,0) into the table. c) if the table has more than one entry, the client takes any entry (normaly the one with the lowest n) from the tableand removes it. d) the client unlocks the table Now the client is able to create offline up to 2^32 new unique identifiers. Increasing the numbers above and you get even more possible values. They create unique identifieres like (n,x), (n,x+1), ... If the client reaches this limit during lifetime it does the above again. If the client terminates, it writes it actual pair into this table. Ok, that's it. If you want to have more information ... just contact me. Marten