Re: [GENERAL] using ID as a key - Mailing list pgsql-general

From Ed Loehr
Subject Re: [GENERAL] using ID as a key
Date
Msg-id 389EEBC1.AC9E65E9@austin.rr.com
Whole thread Raw
In response to Re: [GENERAL] using ID as a key  (davidb@vectormath.com)
Responses Re: [GENERAL] using ID as a key  ("Ross J. Reedstrom" <reedstrm@wallace.ece.rice.edu>)
Re: [GENERAL] using ID as a key  (<kaiq@realtyideas.com>)
Re: [GENERAL] using ID as a key  (Marten Feldtmann <marten@feki.toppoint.de>)
List pgsql-general
Sevo Stille wrote:
>
> davidb@vectormath.com wrote:
> >
> > ID generation ought to be handled programmatically.
>
> At the server? This is what OIDs do - alas, these are even less portable
> than serials and sequences. At the client interface? Nice, as long as
> you have a single user database. In a multiuser environment, generating
> and maintaining a unique ID externally to the database is close to
> impossible. Simply incrementing the highest available ID from the
> database by one is error prone, even if you catch duplicate insertions
> by making the ID field UNIQUE and incrementing the ID in a loop until
> you succeed - a slow client in a heavily updated database might
> permanently fail to insert his record before the generated ID is used by
> some other, faster client. And generating unique IDs based on an
> database-independent would require some synchronized mechanism for ID
> generation, adding dependency on the ID source to dependency on the
> database.

These are resolvable problems.  One way to do a programmatic ID
generation as David B. suggests is to have a DB server whose purpose
is to issue "seed numbers" which designate, through use of a
pre-determined multiplier, a *range* of allocated IDs guaranteed to be
unique across all seed requesters.  This avoids both of the problems
raised above (throughput and synchronization).

For example, suppose we have 10 servers, a1 through a10, each of whom
needs to insert new primary keys that are unique across all 10
servers.  Upon startup, each server queries the seed server, call it
S, for a seed.  Suppose the pre-determined multiplier is 100,000,
meaning ranges will be allocated in sets of 100,000.  S returns 1 on
the first request from, say, a7.  That means that a7 is now allocated
IDs 100,000 thru 199,999.  S might then return seed value 2 to a4,
meaning a4 can use 200,000 thru 299,999, and so on for each of the
other servers.  Internally, S could simply use a sequence object to
handle the atomic update requirement.  When S's next internal sequence
value is at 7, for example, that means it has allocated IDs through
699,999.

This removes the bottleneck on serial generation at the expense of
(infrequent) dependencies on the seed server S, and the serials are
not time-ordered.  A few more details must be handled, and there is
some maintainance overhead, but it seems to work fairly well.

Cheers,
Ed Loehr

pgsql-general by date:

Previous
From: Frank R Callaghan
Date:
Subject: formatting dates?
Next
From: Sevo Stille
Date:
Subject: Re: [GENERAL] formatting dates?