Thread: Re: [GENERAL] using ID as a key

Re: [GENERAL] using ID as a key

From

davidb@vectormath.com

Date:

04 February 2000, 17:09:54

Hi Sheila,

For general database design considerations (not specific to Postgres) I
disagree with the others on the use of serials and sequences.  These
things never migrate well from platform to platform, they often break, and
dealing with them is a nightmare if you ever have to do any bulk data
copying.
ID generation ought to be handled programmatically.

As far as referential integrity is concerned, that is supposed to be
implemented in the next release, due out real soon now.  Personally I'm
waiting for that release.

Finally, since you proclaim that you are new to database design, I cannot
pass up this opportunity strike out against the forces of evil.  Unless you
ABSOLUTELY cannot avoid it, never use more than one field as your primary
key.  Oh, it seems like a good idea at design time, but it's not.  If you
use
complex primary keys, you and your programmers will regret it for the rest
of
the application's useful life.

For what it's worth . . . I would do it as follows:

CREATE TABLE tbl_agency
(
    nagencyid    INT    NOT NULL,
    szotherdata    VARCHAR(30)
);
CREATE UNIQUE INDEX tbl_agency_pk ON tbl_agency (nagencyid);

CREATE TABLE tbl_employee
(
    nemployeeid    INT    NOT NULL,
    nagencyid        INT,
    szotherdata    VARCHAR(30)
);
CREATE UNIQUE INDEX tbl_employee_pk ON tbl_employee (nemployeeid)

David Boerwinkle

-----Original Message-----
From: sheila bel <sheilabel@hotmail.com>
To: pgsql-general@postgreSQL.org <pgsql-general@postgreSQL.org>
Date: Friday, February 04, 2000 10:50 AM
Subject: [GENERAL] using ID as a key


>Hi,
>
>I'm new to data base design so please bare with me if my
>question is so basic..
>I'm designing a database, two of the tables are EMPLOYEES and
>AGENCIES. I need an ID for each of them. I would also need to
>have the agencyID in the EMPLOYEES table so that I can identify
>which agency they belong to something like a foreign key. I know
>postgreSQL does not support it so how do I implement this ?
>What kind of data type should I use for the ID ? In general I
>do not know how to implement IDs and use them as keys.
>I've read the user manual and a bit of the programmer's
>manual several times but haven't found anything that will
>apply to this situation.
>
>Please Help. Thank You.
>
>-Sheila
>
>
>
>
>************
>

Re: [GENERAL] using ID as a key

From

Ed Loehr

Date:

04 February 2000, 17:37:55

davidb@vectormath.com wrote:
>
> Hi Sheila,
>
> For general database design considerations (not specific to Postgres) I
> disagree with the others on the use of serials and sequences.  These
> things never migrate well from platform to platform, they often break, and
> dealing with them is a nightmare if you ever have to do any bulk data
> copying.
> ID generation ought to be handled programmatically.

I'm ignorant of these problems with serials, but would like to
consider how to avoid them.  Can you give examples of these 3 cases?

Cheers,
Ed Loehr

Re: [GENERAL] using ID as a key

From

Simon Drabble

Date:

04 February 2000, 17:38:55

On Fri, 4 Feb 2000 davidb@vectormath.com wrote:

>
> Finally, since you proclaim that you are new to database design, I cannot
> pass up this opportunity strike out against the forces of evil.  Unless you
> ABSOLUTELY cannot avoid it, never use more than one field as your primary
> key.  Oh, it seems like a good idea at design time, but it's not.  If you
> use
> complex primary keys, you and your programmers will regret it for the rest
> of
> the application's useful life.
>
>
> David Boerwinkle
>

David,

What makes this such a bad idea? I am just now implementing a db that uses two
fields as primary key, and I'd like to know what I'm letting myself and
colleagues in for! I can only guess that it will slow down queries, but that's
not something a programmer should necessarily be concerned with..

(Thankfully it is not too late to change the design if need be :) )

Simon.

--
 "Linux - open doors, not windows."

   Simon Drabble                      It's like karma for your brain.
   simon@eskimo.com

Re: [GENERAL] using ID as a key

From

Date:

04 February 2000, 17:46:55

oracle, sql server (identity property, closer to pg's oid
but more manipulatable) have sequence.
however, since they are not sql92, so, it is a headache.
I'm now porting linux/pg/apache/perl/cgi to NT/MSsql/iis/perl/cgi,
this is a problem -- I'm now reading doc from M$ on how to
migrate oracle to mssql to get the hint of how to migrate
pg to mssql.

however, "programmatically" is really ugly: you have to use
flock, easy to be the bottleneck. -- not sure tho, any ideas?

On Fri, 4 Feb 2000 davidb@vectormath.com wrote:

> Hi Sheila,
>
> For general database design considerations (not specific to Postgres) I
> disagree with the others on the use of serials and sequences.  These
> things never migrate well from platform to platform, they often break, and
> dealing with them is a nightmare if you ever have to do any bulk data
> copying.
> ID generation ought to be handled programmatically.
>
> As far as referential integrity is concerned, that is supposed to be
> implemented in the next release, due out real soon now.  Personally I'm
> waiting for that release.
>
> Finally, since you proclaim that you are new to database design, I cannot
> pass up this opportunity strike out against the forces of evil.  Unless you
> ABSOLUTELY cannot avoid it, never use more than one field as your primary
> key.  Oh, it seems like a good idea at design time, but it's not.  If you
> use
> complex primary keys, you and your programmers will regret it for the rest
> of
> the application's useful life.
>
> For what it's worth . . . I would do it as follows:
>
> CREATE TABLE tbl_agency
> (
>     nagencyid    INT    NOT NULL,
>     szotherdata    VARCHAR(30)
> );
> CREATE UNIQUE INDEX tbl_agency_pk ON tbl_agency (nagencyid);
>
> CREATE TABLE tbl_employee
> (
>     nemployeeid    INT    NOT NULL,
>     nagencyid        INT,
>     szotherdata    VARCHAR(30)
> );
> CREATE UNIQUE INDEX tbl_employee_pk ON tbl_employee (nemployeeid)
>
> David Boerwinkle
>
> -----Original Message-----
> From: sheila bel <sheilabel@hotmail.com>
> To: pgsql-general@postgreSQL.org <pgsql-general@postgreSQL.org>
> Date: Friday, February 04, 2000 10:50 AM
> Subject: [GENERAL] using ID as a key
>
>
> >Hi,
> >
> >I'm new to data base design so please bare with me if my
> >question is so basic..
> >I'm designing a database, two of the tables are EMPLOYEES and
> >AGENCIES. I need an ID for each of them. I would also need to
> >have the agencyID in the EMPLOYEES table so that I can identify
> >which agency they belong to something like a foreign key. I know
> >postgreSQL does not support it so how do I implement this ?
> >What kind of data type should I use for the ID ? In general I
> >do not know how to implement IDs and use them as keys.
> >I've read the user manual and a bit of the programmer's
> >manual several times but haven't found anything that will
> >apply to this situation.
> >
> >Please Help. Thank You.
> >
> >-Sheila
> >
> >
> >
> >
> >************
> >
>
>
> ************
>

Re: [GENERAL] using ID as a key

From

Sevo Stille

Date:

07 February 2000, 07:29:09

davidb@vectormath.com wrote:
>
> Hi Sheila,
>
> For general database design considerations (not specific to Postgres) I
> disagree with the others on the use of serials and sequences.  These
> things never migrate well from platform to platform, they often break, and
> dealing with them is a nightmare if you ever have to do any bulk data
> copying.
> ID generation ought to be handled programmatically.

At the server? This is what OIDs do - alas, these are even less portable
than serials and sequences. At the client interface? Nice, as long as
you have a single user database. In a multiuser environment, generating
and maintaining a unique ID externally to the database is close to
impossible. Simply incrementing the highest available ID from the
database by one is error prone, even if you catch duplicate insertions
by making the ID field UNIQUE and incrementing the ID in a loop until
you succeed - a slow client in a heavily updated database might
permanently fail to insert his record before the generated ID is used by
some other, faster client. And generating unique IDs based on an
database-independent would require some synchronized mechanism for ID
generation, adding dependency on the ID source to dependency on the
database.

Sevo

--
Sevo Stille
sevo@ip23.net

Re: [GENERAL] using ID as a key

From

Ed Loehr

Date:

07 February 2000, 10:57:12

Sevo Stille wrote:
>
> davidb@vectormath.com wrote:
> >
> > ID generation ought to be handled programmatically.
>
> At the server? This is what OIDs do - alas, these are even less portable
> than serials and sequences. At the client interface? Nice, as long as
> you have a single user database. In a multiuser environment, generating
> and maintaining a unique ID externally to the database is close to
> impossible. Simply incrementing the highest available ID from the
> database by one is error prone, even if you catch duplicate insertions
> by making the ID field UNIQUE and incrementing the ID in a loop until
> you succeed - a slow client in a heavily updated database might
> permanently fail to insert his record before the generated ID is used by
> some other, faster client. And generating unique IDs based on an
> database-independent would require some synchronized mechanism for ID
> generation, adding dependency on the ID source to dependency on the
> database.

These are resolvable problems.  One way to do a programmatic ID
generation as David B. suggests is to have a DB server whose purpose
is to issue "seed numbers" which designate, through use of a
pre-determined multiplier, a *range* of allocated IDs guaranteed to be
unique across all seed requesters.  This avoids both of the problems
raised above (throughput and synchronization).

For example, suppose we have 10 servers, a1 through a10, each of whom
needs to insert new primary keys that are unique across all 10
servers.  Upon startup, each server queries the seed server, call it
S, for a seed.  Suppose the pre-determined multiplier is 100,000,
meaning ranges will be allocated in sets of 100,000.  S returns 1 on
the first request from, say, a7.  That means that a7 is now allocated
IDs 100,000 thru 199,999.  S might then return seed value 2 to a4,
meaning a4 can use 200,000 thru 299,999, and so on for each of the
other servers.  Internally, S could simply use a sequence object to
handle the atomic update requirement.  When S's next internal sequence
value is at 7, for example, that means it has allocated IDs through
699,999.

This removes the bottleneck on serial generation at the expense of
(infrequent) dependencies on the seed server S, and the serials are
not time-ordered.  A few more details must be handled, and there is
some maintainance overhead, but it seems to work fairly well.

Cheers,
Ed Loehr

Re: [GENERAL] using ID as a key

From

"Ross J. Reedstrom"

Date:

07 February 2000, 11:37:12

On Mon, Feb 07, 2000 at 09:58:57AM -0600, Ed Loehr wrote:

<snip mention of coordination vi DB server>

> These are resolvable problems.  One way to do a programmatic ID
> generation as David B. suggests is to have a DB server whose purpose
> is to issue "seed numbers" which designate, through use of a
> pre-determined multiplier, a *range* of allocated IDs guaranteed to be
> unique across all seed requesters.  This avoids both of the problems
> raised above (throughput and synchronization).
>

<snip detailed explanation>

Hmm, Ed, that sounds like a very nice description of the pgsql sequence
object. See:

http://www.postgresql.org/docs/user/sql-createsequence.htm

In particular, the 'cache' option sets the number of sequence numbers
allocated to a particular backend for any call to 'nextval' on that
sequence, exactly as you described. The only difference I see is your
description wasn't clear about how your serial allocations interact
with transactions. Sounds like you want the 'cache' to be server specific
(a1-a4) and span transactions. The postgresql solution has been to make
the cache of numbers connection specific, so any that don't get used
for a particular connection get 'wasted' when that connection closes.

Also, cache size is a property of the sequence object, rather than the
connection or individual call, so even if your program knows it's going
to need, say 1000 sequence values at a crack, but another backend is
only going to need one at a time, and rarely, there's no way to optimize
the cache size for both backends. Might be an intersting extension to
the sequence object: a call that increases the cache size for this
connection only: that way, if your doing a bulk load, for example,
you can grab numbers in large blocks, and just fire away.

>
> This removes the bottleneck on serial generation at the expense of
> (infrequent) dependencies on the seed server S, and the serials are
> not time-ordered.  A few more details must be handled, and there is
> some maintainance overhead, but it seems to work fairly well.
>

Yes, it does. ;-)

Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005

Re: [GENERAL] using ID as a key

From

Date:

07 February 2000, 11:38:12


On Mon, 7 Feb 2000, Ed Loehr wrote:

> These are resolvable problems.  One way to do a programmatic ID
> generation as David B. suggests is to have a DB server whose purpose
> is to issue "seed numbers" which designate, through use of a
> pre-determined multiplier, a *range* of allocated IDs guaranteed to be
> unique across all seed requesters.  This avoids both of the problems
> raised above (throughput and synchronization).
  .....
> not time-ordered.  A few more details must be handled, and there is
> some maintainance overhead, but it seems to work fairly well.

sounds familiar -- the seed server can be pg, and other servers can be
user-program, and pg's sequence can do this already.
and, it seems not "programmatically at all.

Re: [GENERAL] using ID as a key

From

Ed Loehr

Date:

07 February 2000, 14:21:16

"Ross J. Reedstrom" wrote:
>
> > ...  One way to do a programmatic ID
> > generation as David B. suggests is to have a DB server whose purpose
> > is to issue "seed numbers" which designate, through use of a
> > pre-determined multiplier, a *range* of allocated IDs guaranteed to be
> > unique across all seed requesters.  This avoids both of the problems
> > raised above (throughput and synchronization).
>
> ... sounds like a very nice description of the pgsql sequence object.
>
> In particular, the 'cache' option sets the number of sequence numbers
> allocated to a particular backend for any call to 'nextval' on that
> sequence, exactly as you described. ...
>
> Also, cache size is a property of the sequence object, rather than the
> connection or individual call, so even if your program knows it's going
> to need, say 1000 sequence values at a crack, but another backend is
> only going to need one at a time, and rarely, there's no way to optimize
> the cache size for both backends. Might be an intersting extension to
> the sequence object: a call that increases the cache size for this
> connection only: that way, if your doing a bulk load, for example,
> you can grab numbers in large blocks, and just fire away.

Ya know, I'd read (and totally forgotten) the cache option on postgres
sequence objects.  But yes, it sounds like a very similar idea in a
different implementation (and not my idea).

I see your point about the missing ability to optimize the allocation
range.  Our approach had a built-in assumption that all needs were for
roughly the same range size (lots of webservers, relatively balanced
load), which was fine.  Your idea definitely would add useful
flexibility.

We didn't use transactions in our implementation, so it's an open
question for me.  Re managing concurrent transactions and ID
generation, one possibility in a many-host system in which scalability
matters might be to have the "local" servers keep their own (partial?)
database and manage the transaction issue with a "local" sequence
object.  It would have to make sure to constrain the domain of the
local sequence object to that range which was allocated from the
central seed server.  I suppose one might need the central seed server
to be using a sequence object as well.

Cheers,
Ed Loehr

Re: [GENERAL] using ID as a key

From

Ed Loehr

Date:

07 February 2000, 14:36:24

kaiq@realtyideas.com wrote:
>
> On Mon, 7 Feb 2000, Ed Loehr wrote:
>
> > These are resolvable problems.  One way to do a programmatic ID
> > generation as David B. suggests is to have a DB server whose purpose
> > is to issue "seed numbers" which designate, through use of a
> > pre-determined multiplier, a *range* of allocated IDs guaranteed to be
> > unique across all seed requesters.  This avoids both of the problems
> > raised above (throughput and synchronization).
>
> sounds familiar -- the seed server can be pg, and other servers can be
> user-program, and pg's sequence can do this already.

Yes, coincidentally, very similar to sequence cache functionality in
principle, as someone pointed out. [Wish I'd had those sequence
objects when we did the implementation...]

> and, it seems not "programmatically at all.

What would make it "programmatic" in my view would be calling a
function, as in "$newID = GetNewID()", prior to INSERT and then using
the returned ID value in your INSERT, rather than using a 'default
nextval' to get the value.  I don't see why one wouldn't want to use a
sequence object within GetNewID(), FWIW.  But sounds like it is not
the kind of programmatic example/explanation you were looking for...

Cheers,
Ed Loehr

Re: [GENERAL] using ID as a key

From

"Ross J. Reedstrom"

Date:

07 February 2000, 15:10:14

On Mon, Feb 07, 2000 at 01:37:21PM -0600, Ed Loehr wrote:
> kaiq@realtyideas.com wrote:
> >
> > and, it seems not "programmatically at all.
>
> What would make it "programmatic" in my view would be calling a
> function, as in "$newID = GetNewID()", prior to INSERT and then using
> the returned ID value in your INSERT, rather than using a 'default
> nextval' to get the value.  I don't see why one wouldn't want to use a
> sequence object within GetNewID(), FWIW.  But sounds like it is not
> the kind of programmatic example/explanation you were looking for...
>

In fact, that's exactly how a number of core psql developers recommend
handling the problem of how to get the new value just assigned by
a default nextval() clause: don't use the default, do:

$newID = SELECT nextval('my_ID_seq')

INSERT INTO my_table (my_id,somethng,otherthng) VALUES ($newID, $some, $other)

or equivalent, so you've already got the ID in hand.

Personally, I use the SELECT curval('seq_name') construct.

Ross

--
Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St.,  Houston, TX 77005

Re: [GENERAL] using ID as a key

From

Date:

07 February 2000, 15:44:15

either way, I do not think it's "programmatic", I assume
nobody think they are?

On Mon, 7 Feb 2000, Ross J. Reedstrom wrote:

> On Mon, Feb 07, 2000 at 01:37:21PM -0600, Ed Loehr wrote:
> > kaiq@realtyideas.com wrote:
> > >
> > > and, it seems not "programmatically at all.
> >
> > What would make it "programmatic" in my view would be calling a
> > function, as in "$newID = GetNewID()", prior to INSERT and then using
> > the returned ID value in your INSERT, rather than using a 'default
> > nextval' to get the value.  I don't see why one wouldn't want to use a
> > sequence object within GetNewID(), FWIW.  But sounds like it is not
> > the kind of programmatic example/explanation you were looking for...
> >
>
> In fact, that's exactly how a number of core psql developers recommend
> handling the problem of how to get the new value just assigned by
> a default nextval() clause: don't use the default, do:
>
> $newID = SELECT nextval('my_ID_seq')
>
> INSERT INTO my_table (my_id,somethng,otherthng) VALUES ($newID, $some, $other)
>
> or equivalent, so you've already got the ID in hand.
>
> Personally, I use the SELECT curval('seq_name') construct.
>
> Ross
>
> --
> Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu>
> NSBRI Research Scientist/Programmer
> Computer and Information Technology Institute
> Rice University, 6100 S. Main St.,  Houston, TX 77005
>
>
> ************
>

Re: [GENERAL] using ID as a key

From

Marten Feldtmann

Date:

07 February 2000, 22:02:19

> davidb@vectormath.com wrote:
> >
> > Hi Sheila,
> >
> > For general database design considerations (not specific to Postgres) I
> > disagree with the others on the use of serials and sequences.  These
> > things never migrate well from platform to platform, they often break, and
> > dealing with them is a nightmare if you ever have to do any bulk data
> > copying.
> > ID generation ought to be handled programmatically.
>
> At the server? This is what OIDs do - alas, these are even less portable
> than serials and sequences. At the client interface? Nice, as long as
> you have a single user database. In a multiuser environment, generating
> and maintaining a unique ID externally to the database is close to

 The are algorithm out there which handles this problem very well and
perhaps much better than a database can do and they define a key in a
complete portable way and it needs only some queries to the database
at the beginning of your application and for further key generation
the's no further communication needed in a multi-user environment.

 The algortihm are available, the software is available and there's
no need for the database to offer this special feature ... but of
course it's another feature ("marketing") and it's more handy to use
if you do many "manual" sql execution rather then using this
database in an application environement.

 I've described such an algorithm on a pgsql list ("High-low algorithm")
some weeks ago. Papers are out in the internet - they are mostly described
in papers which talk about "object - rdbms" mapping.

 Marten

Re: [GENERAL] using ID as a key

From

Marten Feldtmann

Date:

07 February 2000, 22:02:19

And here's the description of the "high-low" algorithm for
programs to create unique id's in a multi-user environement.


*** Structure of the identifier ***

Contents of the identifier:

 a) session-id
 b) current number within session
 c) class id

These are three numbers. You may create a unique string to get
the id value into a single column.

In our product we decided to print each number to base 36 (to get well
defined ascii string. Fill the number a) and b) with special
characters (e.g. '#') to get strings with lengh of 6.

Then we do the same with c) but the max string length is here 3.

The total size of the id is stored in a column defined via
char(15). You will define an index on the column.

Very nice would be an index handling something like "right(id,3)",
because then you may not only query for a specific id value but
also for all instances of a special class.

*** Structure of the table doing the initial information transfer ***

We need a special table on the database server. They have three columns:

  FX1 -- holding the next valid id within this session
  FX2 -- holding the session number

A row can be written as (internal/session). These rows can be
seen as parameters which can be used from clients to generate unique
identifier.

*** How does it work ***

In the beginning the session table is empty or holds some session
informations. The starting client locks the table.

 -> If the session-table is empty the client inserts a pair (0/2).
    (session 2, id 0 within this session)

    and uses 1 as it's own session number and 0 as the id number.


 -> if the session-table is not empty is looks for the rows with the
    highest (h) and lowest session number (l).

    -> if both numbers are equal it stores a new row into the session
       table the value-pair (0/h+1) and uses the row (h) for further
       work. It removes the row (h) - or actually updates the row (h)
       to become the row (h+1).

    -> otherwise the application removes this row with session number
       (l) and uses row (l) for further work.

The application unlocks the session-table.

*** After the initialization phase *

Now the application can create unique id's without doing a query
against the database (just be increment the id number within the
session). There may be the case where the application has created
so many objects that it uses all available numbers within a session:
ok, then it goesback to the initialization phase and calculates the
next session row.

If the application terminates is lockes the table and stores it's
actual values (?,session number) into the database. Other clients
may use this number in the future.

If the application crashes you may loose some id's -- but this is
not practically a problem.


If something is not clear - please ask.


Marten Feldtmann

Re: [GENERAL] using ID as a key

From

Date:

08 February 2000, 10:54:42

sounds intriguing. although it still use db, but because it
does not need any special db feature (table-locking is
common), it qualifys as "programmatical" solution.

however, not totally understood yet, let's see:

comparing to file locking (e.g. perl's flock)
1) locking is enforced. safer than flock, which is just cooperative;
   however, it need extra db session. seems flock is fast esp. if
   each app connection session only need to get one id.
2) it gets a block of id's to the client to reduce traffic to the
"central control".
   how about for each app connection session, usually only increase one?
   overhead?
3) because 2) may create a lot of holes, to make it better, somehow (the
algorithm) it can return id's to the pool. (then, the id's assigned are
not monotonous) -- unless the client crashes.

is that understanding correct?

also, how many columns in the "central control" table, 2 or 3?
> We need a special table on the database server. They have three
> columns:
>
>   FX1 -- holding the next valid id within this session
>   FX2 -- holding the session number

######################################################################
On Mon, 7 Feb 2000, Marten Feldtmann wrote:

> And here's the description of the "high-low" algorithm for
> programs to create unique id's in a multi-user environement.
>
>
> *** Structure of the identifier ***
>
> Contents of the identifier:
>
>  a) session-id
>  b) current number within session
>  c) class id
>
> These are three numbers. You may create a unique string to get
> the id value into a single column.
>
> In our product we decided to print each number to base 36 (to get well
> defined ascii string. Fill the number a) and b) with special
> characters (e.g. '#') to get strings with lengh of 6.
>
> Then we do the same with c) but the max string length is here 3.
>
> The total size of the id is stored in a column defined via
> char(15). You will define an index on the column.
>
> Very nice would be an index handling something like "right(id,3)",
> because then you may not only query for a specific id value but
> also for all instances of a special class.
>
> *** Structure of the table doing the initial information transfer ***
>
> We need a special table on the database server. They have three columns:
>
>   FX1 -- holding the next valid id within this session
>   FX2 -- holding the session number
>
> A row can be written as (internal/session). These rows can be
> seen as parameters which can be used from clients to generate unique
> identifier.
>
> *** How does it work ***
>
> In the beginning the session table is empty or holds some session
> informations. The starting client locks the table.
>
>  -> If the session-table is empty the client inserts a pair (0/2).
>     (session 2, id 0 within this session)
>
>     and uses 1 as it's own session number and 0 as the id number.
>
>
>  -> if the session-table is not empty is looks for the rows with the
>     highest (h) and lowest session number (l).
>
>     -> if both numbers are equal it stores a new row into the session
>        table the value-pair (0/h+1) and uses the row (h) for further
>        work. It removes the row (h) - or actually updates the row (h)
>        to become the row (h+1).
>
>     -> otherwise the application removes this row with session number
>        (l) and uses row (l) for further work.
>
> The application unlocks the session-table.
>
> *** After the initialization phase *
>
> Now the application can create unique id's without doing a query
> against the database (just be increment the id number within the
> session). There may be the case where the application has created
> so many objects that it uses all available numbers within a session:
> ok, then it goesback to the initialization phase and calculates the
> next session row.
>
> If the application terminates is lockes the table and stores it's
> actual values (?,session number) into the database. Other clients
> may use this number in the future.
>
> If the application crashes you may loose some id's -- but this is
> not practically a problem.
>
>
> If something is not clear - please ask.
>
>
> Marten Feldtmann
>
>
>
>
>
> ************
>

Re: [GENERAL] using ID as a key

From

Marten Feldtmann

Date:

08 February 2000, 22:02:37

> sounds intriguing. although it still use db, but because it
> does not need any special db feature (table-locking is
> common), it qualifys as "programmatical" solution.
>
> however, not totally understood yet, let's see:
>
> comparing to file locking (e.g. perl's flock)
> 1) locking is enforced. safer than flock, which is just cooperative;
>    however, it need extra db session. seems flock is fast esp. if
>    each app connection session only need to get one id.
> 2) it gets a block of id's to the client to reduce traffic to the
> "central control".
>    how about for each app connection session, usually only increase one?
>    overhead?
> 3) because 2) may create a lot of holes, to make it better, somehow (the
> algorithm) it can return id's to the pool. (then, the id's assigned are
> not monotonous) -- unless the client crashes.
>
> is that understanding correct?
>
> also, how many columns in the "central control" table, 2 or 3?

1)

 The creation statement for the table is:

 CREATE TABLE TX (FX1 int4, FX2 int4 not null)
 CREATE UNIQUE INDEX FX2IND ON TX (FX2)


 The table is locked via the following command in the start-up phase
of the application:

  BEGIN
  INSERT INTO TX (FX2) VALUES(-1)
  COMMIT

 If this statement does not produce an error, no other application
will get access to this table. Actually the application will try
to "lock" the table several times during start-up in case another
client has blocked it.


2) What happens for the first client.

 No client has prior made a connection to the table. Therefore the client
get the internal value pair (session=1, id=0) and writes the following
value pair to the table (session=2,id=0). The client "unlocks" the
table via "delete ... where fx2=-1"

 Now the client creates objects:

 1  0 class-id
 1  1 class-id
 1  2 class-id
 1  3 class-id

 Now the application shuts down and writes back:

 insert into ..... ( ) VALUES(1,4)

 What happens with the second client ?

 He now finds two rows with value pairs (1,4) and (2,0). He "locks"
the table and uses (1,4) for further work, remove the choosen value
pair from table and creates objects with:

 1 4 class-id
 1 5 class-id

 Now the third client is coming.

 He locks table, finds only (2,0), uses them, remove this pair from
table and insert (3,0), unlock the table and creates objects:

 2 0 class-id
 2 1 class-id
 2 2 class-id

 Now the second client terminates, writes back their value pair, then the
third client writes pack their value pair and we have the following pairs
within the table:

 1 6
 2 3
 3 0

 waiting for further clients to ask for values ...

3) Some further considerations about the work which has to be done on
the client. With the values above (6 digits to base 36) we can create
up to (36^6)-1 with a "fresh" value pair (x,0) without any further
communication to the database. (Therefore you may have up to
2.176.782.335 id for internal usage).

 On the other side you may have up to (2^36)-1 sessions.

 The client has to check for an overflow for each new id he creates

 if newid >=2176782335 then
   "get new session pair from database and forget your old one"

4) With the values below you may recognise  46.656 different classes
(or tables) which should be enough.

5) You may change these values (6/6/3) to other values which seems
to be better.

6) You may calculate how long it will take before you have no ids
available any longer ... it's quite a long time.

 The idea behind this has been mentioned in several papers on the
Internet mentioned as "high-low" algorithm. Actually I've seen this
code in one wrapper product.

 Marten