Thread: Better alternative for Primary Key then serial??

Better alternative for Primary Key then serial??

From
pilzner
Date:
Hi - I'm new to PostGres, but have used MSSQL for about a year. I'm going
through the documentation, but after reading about serials have a lot of
worries about keeping referential integrity in place and other things.
Specifically, here are a few scenarios:

a.)
CREATE TABLE TestTable (
TestID SERIAL NOT NULL PRIMARY KEY,
TestData varchar(20) NOT NULL
);

INSERT INTO TestTable(TestData) VALUES ('Data1');
INSERT INTO TestTable(TestData) VALUES ('Data2');
INSERT INTO TestTable(TestData) VALUES ('Data3');

UPDATE TestTable SET TestID = 10 WHERE TestID = 1;


Ok, red flag for me right here. The above works just fine. Granted, if
another table referenced the row w/ TestID = 1, it should violate foreign
key constraints and error out. However, with the use of serial, this is
going to run into another problem, down the road right?? Demonstrated here:

b.)
CREATE TABLE TestTable2 (
TestID SERIAL NOT NULL PRIMARY KEY,
TestData varchar(20) NOT NULL
);

INSERT INTO TestTable2(TestID, TestData) VALUES (1, 'DataData');

INSERT INTO TestTable(TestData) VALUES ('NextData');
--duplicate key violation occurs

INSERT INTO TestTable(TestData) VALUES ('NextData');
--Works fine

To phrase what happens, the next number from serial is '1', but that number
was already explicitly entered. The next call works, because the next serial
number is '2'. Ideally, the first insert would -never- happen and TestID
wouldn't ever be explicitly given a value, but if it were, its a guaranteed
error down the road some ways.

Does stuff like this cause any aches and pains to developers out there, or
do I just need to get in a new mindset??? Also, is there a way to be sure
the primary key is *ONLY* ever given a value by serial, and not subject to
updates???

Thanks,




--
View this message in context:
http://www.nabble.com/Better-alternative-for-Primary-Key-then-serial---tp14289409p14289409.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Better alternative for Primary Key then serial??

From
Adrian Klaver
Date:
On Tuesday 11 December 2007 9:42 pm, pilzner wrote:
> Hi - I'm new to PostGres, but have used MSSQL for about a year. I'm going
> through the documentation, but after reading about serials have a lot of
> worries about keeping referential integrity in place and other things.
> Specifically, here are a few scenarios:
>
> a.)
> CREATE TABLE TestTable (
> TestID SERIAL NOT NULL PRIMARY KEY,
> TestData varchar(20) NOT NULL
> );
>
> INSERT INTO TestTable(TestData) VALUES ('Data1');
> INSERT INTO TestTable(TestData) VALUES ('Data2');
> INSERT INTO TestTable(TestData) VALUES ('Data3');
>
> UPDATE TestTable SET TestID = 10 WHERE TestID = 1;
>
>
> Ok, red flag for me right here. The above works just fine. Granted, if
> another table referenced the row w/ TestID = 1, it should violate foreign
> key constraints and error out. However, with the use of serial, this is
> going to run into another problem, down the road right?? Demonstrated here:
>
> b.)
> CREATE TABLE TestTable2 (
> TestID SERIAL NOT NULL PRIMARY KEY,
> TestData varchar(20) NOT NULL
> );
>
> INSERT INTO TestTable2(TestID, TestData) VALUES (1, 'DataData');
>
> INSERT INTO TestTable(TestData) VALUES ('NextData');
> --duplicate key violation occurs
>
> INSERT INTO TestTable(TestData) VALUES ('NextData');
> --Works fine
>
> To phrase what happens, the next number from serial is '1', but that number
> was already explicitly entered. The next call works, because the next
> serial number is '2'. Ideally, the first insert would -never- happen and
> TestID wouldn't ever be explicitly given a value, but if it were, its a
> guaranteed error down the road some ways.
>
> Does stuff like this cause any aches and pains to developers out there, or
> do I just need to get in a new mindset??? Also, is there a way to be sure
> the primary key is *ONLY* ever given a value by serial, and not subject to
> updates???
>
> Thanks,

Each table that has SERIAL column created gets its own sequence, so there will
be no conflict between tables. That case would only arise if you assigned the
same sequence to multiple tables using DEFAULT nextval("some_sequence") and
mixed manual updating of the sequence and auto updating.  By default a
sequence will always increment forward so you will have a fresh number for
the next request. This means a sequence can have holes as it increments even
if a transaction fails. You can create a duplicate key violation within a
single table by manualling entering a SERIAL id that was already generated.
This applies to any PRIMARY KEY and is sort of the point. The best thing to
do is let the SERIAL sequence work on it own.  If you want to deal with
sequences you should take a look at:
http://www.postgresql.org/docs/8.2/interactive/functions-sequence.html

As to preventing updates. You have a couple of choices.
1)Do not let that field be changed by the user. I usually in either hid the
field or prevented data entry on that field.
2) Create an ON UPDATE TRIGGER that does what you want with the field.

--
Adrian Klaver
aklaver@comcast.net

Re: Better alternative for Primary Key then serial??

From
Alvaro Herrera
Date:
pilzner wrote:

> Does stuff like this cause any aches and pains to developers out there, or
> do I just need to get in a new mindset??? Also, is there a way to be sure
> the primary key is *ONLY* ever given a value by serial, and not subject to
> updates???

It doesn't.  Just do not update the ID -- what use do you have for that
anyway?  If you want to prevent it, you can put a trigger to the column,
but IMHO it would be a waste of your time and machine resources.

--
Alvaro Herrera                         http://www.flickr.com/photos/alvherre/
"Vivir y dejar de vivir son soluciones imaginarias.
La existencia está en otra parte" (Andre Breton)

Re: Better alternative for Primary Key then serial??

From
pilzner
Date:


Alvaro Herrera-3 wrote:
>
> Just do not update the ID -- what use do you have for that
> anyway?  If you want to prevent it, you can put a trigger to the column,
> but IMHO it would be a waste of your time and machine resources.
>

I have absolutely no use to update the ID. I'm not sure why anyone ever
would, and I guess I was a little shocked to find that PostGres even allows
it.

In MSSQL, an identity() is used instead of a serial, and once in place that
sucker is pretty much set in stone without a little know-how (it won't
happen by accident). I'm definitely not here for a "my way is better because
thats what I'm familiar with" discussion, just to get a feel of why its done
that way, if I'm doing anything wrong, or if there is an accepted way to
lock it down.
--
View this message in context:
http://www.nabble.com/Better-alternative-for-Primary-Key-then-serial---tp14289409p14303222.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Better alternative for Primary Key then serial??

From
Alvaro Herrera
Date:
pilzner wrote:
>
>
>
> Alvaro Herrera-3 wrote:
> >
> > Just do not update the ID -- what use do you have for that
> > anyway?  If you want to prevent it, you can put a trigger to the column,
> > but IMHO it would be a waste of your time and machine resources.
> >
>
> I have absolutely no use to update the ID. I'm not sure why anyone ever
> would, and I guess I was a little shocked to find that PostGres even allows
> it.

Oh, I see.  Fact is, Postgres defines it like a plain integer column and
attaches a DEFAULT clause.  That's it.  There's no magic to prevent it
from being modified.

The SQL standard specifies a strange beast called GENERATED BY which has
all sorts of funny behaviors, one of which is what you describe.  There
have been attempts at implementing the semantics described be the spec,
but they are so contorted that so far there hasn't been definitive
success.  It is expected that in the next release (8.4) something will
get done about it.

--
Alvaro Herrera                        http://www.advogato.org/person/alvherre
"How strange it is to find the words "Perl" and "saner" in such close
proximity, with no apparent sense of irony. I doubt that Larry himself
could have managed it."         (ncm, http://lwn.net/Articles/174769/)

Re: Better alternative for Primary Key then serial??

From
Jorge Godoy
Date:
Em Wednesday 12 December 2007 03:42:55 pilzner escreveu:
>
> Does stuff like this cause any aches and pains to developers out there, or
> do I just need to get in a new mindset??? Also, is there a way to be sure
> the primary key is *ONLY* ever given a value by serial, and not subject to
> updates???

Shouldn't the definition of a primary key be an immutable thing that is unique
to the row?  If you change it, then it is not immutable anymore...

You can also have triggers to prevent PK updates...  But I wouldn't go this
route.

Why are you needing updating the PKs of your data?


--
Jorge Godoy      <jgodoy@gmail.com>


Re: Better alternative for Primary Key then serial??

From
Alban Hertroys
Date:
On Dec 13, 2007, at 10:19, Jorge Godoy wrote:

> Em Wednesday 12 December 2007 03:42:55 pilzner escreveu:
>>
>> Does stuff like this cause any aches and pains to developers out
>> there, or
>> do I just need to get in a new mindset??? Also, is there a way to
>> be sure
>> the primary key is *ONLY* ever given a value by serial, and not
>> subject to
>> updates???
>
> Shouldn't the definition of a primary key be an immutable thing
> that is unique
> to the row?  If you change it, then it is not immutable anymore...

Why? If you're worried about foreign keys pointing to them, you can
either define them to follow the change (on update cascade) or to
throw an integrity violation error (default behaviour).

Now I realise this isn't particularly useful for surrogate (primary)
keys, but it sure is for natural keys. They're both keys, is there a
reason to handle surrogate keys differently from natural keys?

The problem the OP is pointing out seems difficult to solve. A
sequence doesn't know about existing records with a possibly higher
number than the sequence is at.

This may be worked around by keeping a list of numbers used up beyond
the current sequence value so the sequence knows what numbers to
skip, but that has problems of its own (if there are many such
numbers, or if the sequence gets created after data has been added to
the list). It gets ugly.

The convention with sequences is that if you use a sequence on a
column (beyond defining one) that you don't insert records with hand-
coded values for that column (unless you're sure you're using an
existing gap before the sequences current value).

Regards,
--
Alban Hertroys

                Sometimes you wake up thinking:
                "Galileo was right, the world does turn"






!DSPAM:737,476112479655680816383!



Re: Better alternative for Primary Key then serial??

From
"John D. Burger"
Date:
Alban Hertroys wrote:

> The problem the OP is pointing out seems difficult to solve. A
> sequence doesn't know about existing records with a possibly higher
> number than the sequence is at.
>
> This may be worked around by keeping a list of numbers used up
> beyond the current sequence value so the sequence knows what
> numbers to skip, but that has problems of its own (if there are
> many such numbers, or if the sequence gets created after data has
> been added to the list). It gets ugly.

I would just have a trigger that sets the serial to NEW.id + 1.
Dunno if this has concurrency issues, though, and it may leave huge
gaps in the key space, and (more importantly) use up your sequence
too quickly.

I have, in fact, had situations where I wanted a serial PK, =and= I
needed to insert with external IDs sometimes - essentially a mix of
natural and surrogate keys (shudder).  It turned out that the natural
keys were always positive, so I set up the sequence to range
=downward= from 0.

- John D. Burger
   MITRE

Re: Better alternative for Primary Key then serial??

From
Andrew Sullivan
Date:
On Wed, Dec 12, 2007 at 12:28:37PM -0800, pilzner wrote:
> thats what I'm familiar with" discussion, just to get a feel of why its done
> that way, if I'm doing anything wrong, or if there is an accepted way to
> lock it down.

It'd be easy to lock down with a trigger that RAISEs ERROR in case OLD.id !=
NEW.id.

A

Re: Better alternative for Primary Key then serial??

From
Alban Hertroys
Date:
On Dec 13, 2007, at 14:12, John D. Burger wrote:

> Alban Hertroys wrote:
>
>> The problem the OP is pointing out seems difficult to solve. A
>> sequence doesn't know about existing records with a possibly
>> higher number than the sequence is at.
>>
>> This may be worked around by keeping a list of numbers used up
>> beyond the current sequence value so the sequence knows what
>> numbers to skip, but that has problems of its own (if there are
>> many such numbers, or if the sequence gets created after data has
>> been added to the list). It gets ugly.
>
> I would just have a trigger that sets the serial to NEW.id + 1.
> Dunno if this has concurrency issues, though, and it may leave huge
> gaps in the key space, and (more importantly) use up your sequence
> too quickly.

That thought had crossed my mind, but I rejected it.

Besides those issues, you'd need to add that trigger to every table
that potentially has this "issue", it does indeed have concurrency
issues, and you're in for some fun if someone decides to use the last
possible value of a serial field to define a special case (although
the sequence could be defined to end before that value of course) and
causes immediate wraparound of the sequence (provided it's allowed to
rotate) and thus causing duplicate key violations as soon as the
sequence matches the first record in the table.

I was looking at a general solution that would work as automatic as
sequences already do. Not that it's a big issue anyway, you just have
to keep in mind that it works like it does.

Just to prevent any mistakes, I am not requesting a change in
behaviour, I'm fine with how it works (and has worked for ages in
many database servers). It is an interesting puzzle though ;)

> I have, in fact, had situations where I wanted a serial PK, =and= I
> needed to insert with external IDs sometimes - essentially a mix of
> natural and surrogate keys (shudder).  It turned out that the
> natural keys were always positive, so I set up the sequence to
> range =downward= from 0.

That's a common solution to the problem, although it leaves the
possibility that people are being smart and enter negative integers
exactly to prevent this problem. And of course you cannot rely on
sorting it by index to get your data more or less in the order inserted.

--
Alban Hertroys

        "If you lose your memory,
         you can't remember where you left it."



!DSPAM:737,476419a19654199211162!