Thread: enums

enums

From
Andrew Dunstan
Date:
If people would like to play, I have created a little kit to help in 
creating first class enum types in a few seconds. It works something 
like this:
   make TYPENAME=rainbow ENUMS=' "red", "orange", "yellow", "green", 
"blue", "indigo", "violet" '   make TYPENAME=rainbow install   psql -f /path/to/contrib/rainbow-install.sql yourdb

and you are done. Now you can do:
   create table foo( r rainbow);   insert into foo values('red');   select 'red'::rainbow < 'green'::rainbow; <--
yieldstrue   select rainbow_order('yellow');  <-- yields 2
 

The kit is at  http://developer.postgresql.org/~adunstan/enumkit.tgz

Needs 8.0 or later, since it use PGXS.

Maximum number of values is 32767 - but if you use that many you're 
insane anyway :-)

I did this as part of thinking about how we might do enums properly. AS 
Chris KL recently noted - it is very often asked for. So this is not the 
end of the road, just a tiny step at the beginning.

cheers

andrew


Re: enums

From
"Jim C. Nasby"
Date:
This is cool; it's something people can use today if nothing else.
Long-term, is it practical to have the enums compiled in? ISTM that's
not very workable, but I'm completely guessing. The other issue is that
this version makes it very difficult to change what's in the enum (not
that that's at all easy with MySQL...)

On Thu, Oct 27, 2005 at 03:47:53PM -0400, Andrew Dunstan wrote:
> 
> If people would like to play, I have created a little kit to help in 
> creating first class enum types in a few seconds. It works something 
> like this:
> 
>    make TYPENAME=rainbow ENUMS=' "red", "orange", "yellow", "green", 
> "blue", "indigo", "violet" '
>    make TYPENAME=rainbow install
>    psql -f /path/to/contrib/rainbow-install.sql yourdb
> 
> and you are done. Now you can do:
> 
>    create table foo( r rainbow);
>    insert into foo values('red');
>    select 'red'::rainbow < 'green'::rainbow; <-- yields true
>    select rainbow_order('yellow');  <-- yields 2
> 
> The kit is at  http://developer.postgresql.org/~adunstan/enumkit.tgz
> 
> Needs 8.0 or later, since it use PGXS.
> 
> Maximum number of values is 32767 - but if you use that many you're 
> insane anyway :-)
> 
> I did this as part of thinking about how we might do enums properly. AS 
> Chris KL recently noted - it is very often asked for. So this is not the 
> end of the road, just a tiny step at the beginning.
> 
> cheers
> 
> andrew
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: enums

From
Ted Rolle
Date:
This little snippet is great!  The only problem I see is that the enums must be consistent across all modules.

What about loading a variable with a "default" value?  Then it could be adjusted to 'play'.

On 10/27/05, Jim C. Nasby < jnasby@pervasive.com> wrote:
This is cool; it's something people can use today if nothing else.
Long-term, is it practical to have the enums compiled in? ISTM that's
not very workable, but I'm completely guessing. The other issue is that
this version makes it very difficult to change what's in the enum (not
that that's at all easy with MySQL...)

On Thu, Oct 27, 2005 at 03:47:53PM -0400, Andrew Dunstan wrote:
>
> If people would like to play, I have created a little kit to help in
> creating first class enum types in a few seconds. It works something
> like this:
>
>    make TYPENAME=rainbow ENUMS=' "red", "orange", "yellow", "green",
> "blue", "indigo", "violet" '
>    make TYPENAME=rainbow install
>    psql -f /path/to/contrib/rainbow-install.sql yourdb
>
> and you are done. Now you can do:
>
>    create table foo( r rainbow);
>    insert into foo values('red');
>    select 'red'::rainbow < 'green'::rainbow; <-- yields true
>    select rainbow_order('yellow');  <-- yields 2
>
> The kit is at  http://developer.postgresql.org/~adunstan/enumkit.tgz
>
> Needs 8.0 or later, since it use PGXS.
>
> Maximum number of values is 32767 - but if you use that many you're
> insane anyway :-)
>
> I did this as part of thinking about how we might do enums properly. AS
> Chris KL recently noted - it is very often asked for. So this is not the
> end of the road, just a tiny step at the beginning.
>
> cheers
>
> andrew


Re: enums

From
"Jim C. Nasby"
Date:
On Thu, Oct 27, 2005 at 04:54:36PM -0400, Ted Rolle wrote:
> This little snippet is great! The only problem I see is that the enums must
> be consistent across all modules.
> 
> What about loading a variable with a "default" value? Then it could be
> adjusted to 'play'.

Huh? Sorry, but you completely lost me here...

On another note, I noticed that the comparison operators seem to be
comparing the underlying numeric value used to store the enum, which is
wrong IMO. Consider:

ENUM color '"red","blue","green"'
CREATE TABLE t (c color);
INSERT INTO t VALUES('blue');
INSERT INTO t VALUES('green');
INSERT INTO t VALUES('red');
SELECT c FROM t ORDER BY c;
red
blue
green

That seems counter-intuitive. It's also exposing an implimentation
detail (that the enum is stored internally as a number).
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: enums

From
"Jim Nasby"
Date:
Adding -hackers back to the list...

> -----Original Message-----
> From: Gregory Maxwell [mailto:gmaxwell@gmail.com]
> Sent: Thursday, October 27, 2005 5:03 PM
> To: Jim Nasby
> Subject: Re: [HACKERS] enums
>
>
> On 10/27/05, Jim C. Nasby <jnasby@pervasive.com> wrote:
> > On Thu, Oct 27, 2005 at 04:54:36PM -0400, Ted Rolle wrote:
> > > This little snippet is great! The only problem I see is
> that the enums must
> > > be consistent across all modules.
> > >
> > > What about loading a variable with a "default" value?
> Then it could be
> > > adjusted to 'play'.
> >
> > Huh? Sorry, but you completely lost me here...
> >
> > On another note, I noticed that the comparison operators seem to be
> > comparing the underlying numeric value used to store the
> enum, which is
> > wrong IMO. Consider:
> >
> > ENUM color '"red","blue","green"'
> > CREATE TABLE t (c color);
> > INSERT INTO t VALUES('blue');
> > INSERT INTO t VALUES('green');
> > INSERT INTO t VALUES('red');
> > SELECT c FROM t ORDER BY c;
> > red
> > blue
> > green
> >
> > That seems counter-intuitive. It's also exposing an implimentation
> > detail (that the enum is stored internally as a number).
>
> You could as equally say that it's ordering it by the order of the
> enum declaration, which seems quite reasonable to me.

I don't really see why that's considered reasonable, especially as a default. I could maybe see an argument for having
ameans to see this ordering, but IMO anything depending on that is broken. I don't think we should be making any
guaranteesabout how enums are stored in the database (including ordering). 

> Now, if you can multiply one enum with another or with an integer,
> that would be a bit odd. But the collation behavior seems quite sane
> to me.
>
> The behavior in mysql is the same:
>
> mysql> create table t (
>     -> color enum ('red','blue','green')
>     -> );
> mysql> INSERT INTO t VALUES ('blue');
> mysql> INSERT INTO t VALUES ('green');
> mysql> INSERT INTO t VALUES ('red');
> mysql> select color from t order by color;
> +-------+
> | color |
> +-------+
> | red   |
> | blue  |
> | green |
> +-------+
> 3 rows in set (0.04 sec)
>
> Hopefully we the end implimentation is a bit better than mysql and
> actually cares what you're inserting:
>
> mysql> select color*2 from t;
> +---------+
> | color*2 |
> +---------+
> |       4 |
> |       6 |
> |       2 |
> +---------+
> 3 rows in set (0.00 sec)
> mysql> INSERT INTO t VALUES (1);
> Query OK, 1 row affected (0.00 sec)
> mysql> INSERT INTO t VALUES ('monkey');
> Query OK, 1 row affected, 1 warning (0.00 sec)
> mysql> select color from t;
> +-------+
> | color |
> +-------+
> | blue  |
> | green |
> | red   |
> | red   |
> |       |
> +-------+
> 5 rows in set (0.00 sec)

Your examples show why I don't think it's a good idea to use MySQL as a guide for how to do enums.

If we do decide to include the concept of ordering in enums, then it should be fully supported and not just an artifact
ofour storage mechanism. This means supporting things like being able to re-order the accepted values in an enum. But
likeI said, I just don't see the use case for doing that. 


Re: enums

From
Gregory Maxwell
Date:
On 10/27/05, Jim Nasby <jnasby@pervasive.com> wrote:
> Adding -hackers back to the list...
> > You could as equally say that it's ordering it by the order of the
> > enum declaration, which seems quite reasonable to me.
>
> I don't really see why that's considered reasonable, especially as a default. I could maybe see an argument for
havinga means to see this ordering, but IMO anything depending on that is broken. I don't think we should be making any
guaranteesabout how enums are stored in the database (including ordering). 

> Your examples show why I don't think it's a good idea to use MySQL as a guide for how to do enums.

Yes, MySQL is broken in some regards, as usual. However, the API isn't
bad (except for the fact that it doesn't care what invalid crap you
throw at it), and more importantly there are thousands of apps and
developers who think around that interface. We should copy it without
the brokenness as much as possible unless we have good cause
otherwise.

> If we do decide to include the concept of ordering in enums, then it should be fully supported and not just an
artifactof our storage mechanism. This means supporting things like being able to re-order the accepted values in an
enum.But like I said, I just don't see the use case for doing that. 

So what do you propose we do for a default ordering?  I hope you don't
think we should force a sort as though the enum labels were text...
That almost certainly incorrect for most applications of enums, which
are used to make opaque labels more human compatible.

MySQL's behavior of allowing the user to specify the collation in the
typedef makes a lot of sense to me, it doesn't matter that it actually
works as an artifact of the storage backend. I'd argue that it would
make sense to sort by the specification order even if we changed the
backend to use varchars rather than numbers.


Re: enums

From
Andrew Dunstan
Date:

Jim C. Nasby wrote:

>
>On another note, I noticed that the comparison operators seem to be
>comparing the underlying numeric value used to store the enum, which is
>wrong IMO. Consider:
>
>ENUM color '"red","blue","green"'
>CREATE TABLE t (c color);
>INSERT INTO t VALUES('blue');
>INSERT INTO t VALUES('green');
>INSERT INTO t VALUES('red');
>SELECT c FROM t ORDER BY c;
>red
>blue
>green
>
>That seems counter-intuitive. It's also exposing an implimentation
>detail (that the enum is stored internally as a number).
>  
>

No it is not. Not in the slightest. It is honoring the enumeration order 
defined for the type. That is the ONLY correct behaviour, IMNSHO. 
Otherwise, you could just as easily use a domain with a check constraint.

In fact, mysql's behaviour is laughably, even ludicrously, inconsistent:

mysql> select color from t order by color;
+-------+
| color |
+-------+
| red   |
| blue  |
| green |
+-------+
3 rows in set (0.06 sec)

mysql> select * from t where color < 'green';
+-------+
| color |
+-------+
| blue  |
+-------+

So for "order by" it honors the enumeration order, but for < it uses the 
lexical ordering. Lovely, eh?

cheers

andrew



Re: enums

From
"Jim C. Nasby"
Date:
On Thu, Oct 27, 2005 at 06:46:24PM -0400, Gregory Maxwell wrote:
> So what do you propose we do for a default ordering?  I hope you don't
> think we should force a sort as though the enum labels were text...

I do think that. Or default ordering on whatever type the enum is (I can
see enums that are something other than text as useful, though that's a
secondary goal).

> That almost certainly incorrect for most applications of enums, which
> are used to make opaque labels more human compatible.

Sorting red before blue doesn't sound very opaque to me...

> MySQL's behavior of allowing the user to specify the collation in the
> typedef makes a lot of sense to me, it doesn't matter that it actually
> works as an artifact of the storage backend. I'd argue that it would
> make sense to sort by the specification order even if we changed the
> backend to use varchars rather than numbers.

Like I said, if we're going to support a concept of ordering of items in
an enum then we need to support it fully. For starters that means having
the ability to re-order things in an enum seamlessly.

If our primary concern is MySQL compatability then we should look at
offering two types of enums; one that mirrors their broken stuff and one
that works they way you'd actually want it to.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: enums

From
Andrew Dunstan
Date:

Gregory Maxwell wrote:

>Yes, MySQL is broken in some regards, as usual. However, the API isn't
>bad (except for the fact that it doesn't care what invalid crap you
>throw at it), and more importantly there are thousands of apps and
>developers who think around that interface. We should copy it without
>the brokenness as much as possible unless we have good cause
>otherwise.
>  
>


mmm ... no. It is too broken. We should do enums orthogonally with other 
type definitions in PostgreSQL. Where I would like to get to is that we 
have a flavor of CREATE TYPE that will create the enum type for us, 
including all the support that I build into my little kit. And if you 
want to change the enumeration set on a column, you would use ALTER 
TABLE foo ALTER COLUMN  bar  TYPE newtype USING ...

Inline declarations of enums does not strike me as good.

>  
>
>>If we do decide to include the concept of ordering in enums, then it should be fully supported and not just an
artifactof our storage mechanism. This means supporting things like being able to re-order the accepted values in an
enum.But like I said, I just don't see the use case for doing that.
 
>>    
>>
>
>So what do you propose we do for a default ordering?  I hope you don't
>think we should force a sort as though the enum labels were text...
>That almost certainly incorrect for most applications of enums, which
>are used to make opaque labels more human compatible.
>  
>

Yeah, lexical ordering is surely wrong. I believe that the enumeration 
order is the only correct ordering. If you wanted lexical ordering, you 
could always do "order by myenum::text".

>MySQL's behavior of allowing the user to specify the collation in the
>typedef makes a lot of sense to me, it doesn't matter that it actually
>works as an artifact of the storage backend. I'd argue that it would
>make sense to sort by the specification order even if we changed the
>backend to use varchars rather than numbers.
>
>  
>

Can't see the point in it, I must confess.

cheers

andrew


Re: enums

From
Andrew Dunstan
Date:

Jim C. Nasby wrote:

>
>Like I said, if we're going to support a concept of ordering of items in
>an enum then we need to support it fully. For starters that means having
>the ability to re-order things in an enum seamlessly.
>  
>

I do not see this at all. An enumeration defines an ordering and a set 
of labels. Why should you be able to change it?  If you want a different 
ordering, create a new enumeration. Let's do this right because it's a 
feature worth having, not just mimic the competition's idiocy.


cheers

andrew


Re: enums

From
Michael Fuhr
Date:
On Thu, Oct 27, 2005 at 07:02:45PM -0400, Andrew Dunstan wrote:
> Jim C. Nasby wrote:
> >On another note, I noticed that the comparison operators seem to be
> >comparing the underlying numeric value used to store the enum, which is
> >wrong IMO. Consider:
> >
> >ENUM color '"red","blue","green"'
> >CREATE TABLE t (c color);
> >INSERT INTO t VALUES('blue');
> >INSERT INTO t VALUES('green');
> >INSERT INTO t VALUES('red');
> >SELECT c FROM t ORDER BY c;
> >red
> >blue
> >green
> >
> >That seems counter-intuitive. It's also exposing an implimentation
> >detail (that the enum is stored internally as a number).
> 
> No it is not. Not in the slightest. It is honoring the enumeration order 
> defined for the type. That is the ONLY correct behaviour, IMNSHO. 

I agree.  Honoring the enumeration order makes sense if you consider
the values as things that should be ordered based on some property
of their thingness instead of based on what their labels happen to
be in a particular language.  If I have an enumeration of colors I
might want values sorted by their position in the spectrum, so
whether the labels are (red, green, blue) or (gorri, berde, urdin)
I might want to maintain that particular order.

If you want values ordered lexically then you can enumerate them
that way.  Why force that behavior on people who want to order based
on some other criteria?

-- 
Michael Fuhr


Re: enums

From
Trent Shipley
Date:
On Thursday 2005-10-27 16:22, Andrew Dunstan wrote:
> Jim C. Nasby wrote:
> >Like I said, if we're going to support a concept of ordering of items in
> >an enum then we need to support it fully. For starters that means having
> >the ability to re-order things in an enum seamlessly.
>
> I do not see this at all. An enumeration defines an ordering and a set
> of labels. Why should you be able to change it?  If you want a different
> ordering, create a new enumeration. Let's do this right because it's a
> feature worth having, not just mimic the competition's idiocy
>

The symbols in the set have no _per se_ order.
A collation rule is necessary to sort the symbols consistently.
ASCII is an enumeration
Unicode is a large enumeration with a simple naive collation and a complex 
default collation.

Defining a set results in an unordered specification of symbols.
Defining a collation produces an ordering for the set.
There can be many collations for a set.

An enumeration is just a computer science short-hand way to define a set and a 
"native" collation for the set. 
An enumeration's native collation need not be the only, or even the most 
common, collation for the enumerated set of symbols.


Re: enums

From
Michael Glaesemann
Date:
On Oct 28, 2005, at 9:23 , Trent Shipley wrote:

> On Thursday 2005-10-27 16:22, Andrew Dunstan wrote:
>
>> Jim C. Nasby wrote:
>>
>>> Like I said, if we're going to support a concept of ordering of  
>>> items in
>>> an enum then we need to support it fully. For starters that means  
>>> having
>>> the ability to re-order things in an enum seamlessly.
>>>
>>
>> I do not see this at all. An enumeration defines an ordering and a  
>> set
>> of labels. Why should you be able to change it?  If you want a  
>> different
>> ordering, create a new enumeration. Let's do this right because  
>> it's a
>> feature worth having, not just mimic the competition's idiocy
>>
>>
>
> The symbols in the set have no _per se_ order.
> A collation rule is necessary to sort the symbols consistently.
> ASCII is an enumeration
> Unicode is a large enumeration with a simple naive collation and a  
> complex
> default collation.
>
> Defining a set results in an unordered specification of symbols.
> Defining a collation produces an ordering for the set.
> There can be many collations for a set.
>
> An enumeration is just a computer science short-hand way to define  
> a set and a
> "native" collation for the set.
> An enumeration's native collation need not be the only, or even the  
> most
> common, collation for the enumerated set of symbols.

Relational databases already have a type for unordered sets: tables.  
IMO, if there's going to be a separate enumerated type, it should be  
more than just an alternative way of defining a set of key-value pairs.

Michael Glaesemann
grzm myrealbox com






Re: enums

From
Andrew Dunstan
Date:

Trent Shipley wrote:

>
>An enumeration is just a computer science short-hand way to define a set and a 
>"native" collation for the set. 
>An enumeration's native collation need not be the only, or even the most 
>common, collation for the enumerated set of symbols.
>
>
>  
>

No it's not. Many languages define enumerated types as having a fixed 
ordering. It is not just a set. That is the sense in which I am using 
the term. And, after all, you can always write a function that gives you 
an alternative ordering. All we are building in is an ordering based in 
the enumeration order, which you are at perfect liberty not to use. 
Postgres lets you define an alternative operator class for any type, so 
we are very flexible. People are getting way too hung up over this.

cheers

andrew


Re: enums

From
"Jim C. Nasby"
Date:
On Thu, Oct 27, 2005 at 05:41:01PM -0600, Michael Fuhr wrote:
> If you want values ordered lexically then you can enumerate them
> that way.  Why force that behavior on people who want to order based
> on some other criteria?

Well, I was arguing about the default behavior. I'd bet that we're going
to have a constant set of people wondering why ORDER BY is doing the
'wrong thing' when ordering an ENUM, which is why I argued that the
default behavior should be ordering based on the external type, not how
we're storing it or some other order.

But I'm clearly in the minority in this view, so I'm droping it. :)

On to other issues...

Andrew, you mentioned that if you want to change the ordering you should
just create a new type. What about if you need to change the values that
are in the enum? MySQL does (or at least did, it's been some time since
I've messed with this) a horrible job at that. There's no way to rename
anything; you have to add the new names you want, then do a bulk update,
then delete the (now old) names. IMO this is broken.

Also, if we are going to maintain ordering and mapping (presumably via
the internal number that we're storing), then I think we should expose
that, at least optionally. So for example, you should be able to define
what a specific enum value means. Not everyone will want a linear
numbering starting at 0 afterall.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: enums

From
"Cristian Prieto"
Date:
What about use the declaration order as the enum order?, for example: if I
declare something like: "CREATE ENUM hola ('item1', 'item3', 'item2');"
-this is just assuming an hypothetical approach to use enum types in this
way- and the logical order of the items could be 'item1', 'item3', 'item2'
just because the user decided to create them in this way.

Well, that's just an idea...

-----Original Message-----
From: pgsql-hackers-owner@postgresql.org
[mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Andrew Dunstan
Sent: Jueves, 27 de Octubre de 2005 07:06 p.m.
To: tshipley@deru.com
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] enums



Trent Shipley wrote:

>
>An enumeration is just a computer science short-hand way to define a set
and a 
>"native" collation for the set. 
>An enumeration's native collation need not be the only, or even the most 
>common, collation for the enumerated set of symbols.
>
>
>  
>

No it's not. Many languages define enumerated types as having a fixed 
ordering. It is not just a set. That is the sense in which I am using 
the term. And, after all, you can always write a function that gives you 
an alternative ordering. All we are building in is an ordering based in 
the enumeration order, which you are at perfect liberty not to use. 
Postgres lets you define an alternative operator class for any type, so 
we are very flexible. People are getting way too hung up over this.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate      subscribe-nomail command to
majordomo@postgresql.orgso that your      message can get through to the mailing list cleanly
 



Re: enums

From
Andrew Dunstan
Date:

Jim C. Nasby wrote:

>Andrew, you mentioned that if you want to change the ordering you should
>just create a new type. What about if you need to change the values that
>are in the enum? MySQL does (or at least did, it's been some time since
>I've messed with this) a horrible job at that. There's no way to rename
>anything; you have to add the new names you want, then do a bulk update,
>then delete the (now old) names. IMO this is broken.
>  
>


It would just be a standard "ALTER TABLE foo ALTER COLUMN bar TYPE 
newtype USING expression" operation. You would write a function that 
took a value of the old type and returned a value of the new type and 
use a cll to that function in the expression. Since these would be named 
types, unlike the case in mysql where they are anonymously defined 
inline, this would present no difficulties at all.

>Also, if we are going to maintain ordering and mapping (presumably via
>the internal number that we're storing), then I think we should expose
>that, at least optionally. So for example, you should be able to define
>what a specific enum value means. Not everyone will want a linear
>numbering starting at 0 afterall.
>  
>

What on earth for? Users should not care in the slightest what the 
internal representation is . Users who want a map where the values are 
exposed should create a lookup table.

You keep saying that we are using the internal representation as the 
ordering. This is simply the wrong way to look at it. The internal 
representation REFLECTS the ordering; it doesn't impose it. The user has 
imposed the ordering when defining the type. In my enumkit I did provide 
a function that gave back the internal representation, but I am not by 
any means certain that that's a good idea.

cheers

andrew


Re: enums

From
Andrew Dunstan
Date:

Cristian Prieto wrote:

>What about use the declaration order as the enum order?, for example: if I
>declare something like: "CREATE ENUM hola ('item1', 'item3', 'item2');"
>-this is just assuming an hypothetical approach to use enum types in this
>way- and the logical order of the items could be 'item1', 'item3', 'item2'
>just because the user decided to create them in this way.
>
>Well, that's just an idea...
>
>  
>

Using the declaration order is exactly what I have done.

cheers

andrew


Re: enums

From
Trent Shipley
Date:
On Thursday 2005-10-27 17:39, Michael Glaesemann wrote:
> On Oct 28, 2005, at 9:23 , Trent Shipley wrote:
> > On Thursday 2005-10-27 16:22, Andrew Dunstan wrote:
> >> Jim C. Nasby wrote:
<snip/>
> Relational databases already have a type for unordered sets: tables.
> IMO, if there's going to be a separate enumerated type, it should be
> more than just an alternative way of defining a set of key-value pairs.
>
> Michael Glaesemann
> grzm myrealbox com

Of course, what is an enumeration except an *ordered* list of key-value pairs; 
that is, a set with a built-in collation.

Glaesemann is right.  To me that implies that no database NEEDS an enumeration 
type.

We just started discussing it because it would greatly enhance MySQL 
migration.  Even more important, enumerations would be tremendously 
convenient (even if they are not strictly necessary).

Enumerations would be good marketing and good engineering.

The debate is about implementation:

1) Pure list (seems to be why MySQL does).  Each enumeration is a list.  If 
you want to create a variant collation for the list, create a new, parallel 
list. 

This is very straightforward but violates the "store once, read many times" 
principle of database design. 


2) Hybrid list + secondary collations.  You create and store an enumeration 
(call it enum_a).  If you want to reorder the underlying set, just declare a 
new collation for the enumeration (call the result enum_b).  enum_b is 
effectively a virtual enumeration.  The relationship of enum_b to enum_a is 
like that between a view and its table.

On the downside, this approach is a theoretical stew.  It should be relatively 
easy to implement.



3) Set + collation functions.  You define a set.  You define a collation for 
the set.  Having declared set_a and a collation_a you can then declare 
enumeration_a.  (Note that the result defined by the developer's collation 
function may not necessarily result in a _per se_ enumeration.)

This has the appeal of separating the symbol declaration from its ordering.  
Furthermore, of all the options it is the most powerful.  Unfortunately, it 
may be verbose, unintuitive, and the most difficult to implement.

There is probably no reason approach #1 or #2 could not be implemented using 
the machinery for approach #3 under the bonnet.  That way we could have 
something like:

CREATE SYMBOL SET  {possibly a disguised create table, but probably not for performance 
reasons}
CREATE COLLATION USING function_name
CREATE ENUMERATION
and
CREATE MYSQL_ENUMERATION. {probably just overload CREATE ENUMERATION}

===============================

http://dev.mysql.com/doc/refman/5.1/en/string-type-overview.html
http://dev.mysql.com/doc/refman/5.1/en/enum.html
(Note that unlike C enumerations MySql enumerations are two way and do some 
context dependent magic.)


Re: enums

From
"Jim C. Nasby"
Date:
On Thu, Oct 27, 2005 at 09:45:05PM -0400, Andrew Dunstan wrote:
> 
> 
> Jim C. Nasby wrote:
> 
> >Andrew, you mentioned that if you want to change the ordering you should
> >just create a new type. What about if you need to change the values that
> >are in the enum? MySQL does (or at least did, it's been some time since
> >I've messed with this) a horrible job at that. There's no way to rename
> >anything; you have to add the new names you want, then do a bulk update,
> >then delete the (now old) names. IMO this is broken.
> > 
> >
> 
> 
> It would just be a standard "ALTER TABLE foo ALTER COLUMN bar TYPE 
> newtype USING expression" operation. You would write a function that 
> took a value of the old type and returned a value of the new type and 
> use a cll to that function in the expression. Since these would be named 
> types, unlike the case in mysql where they are anonymously defined 
> inline, this would present no difficulties at all.

But why force a re-write of the entire table just to change the name of
something?

Or is ALTER COLUMN TYPE smart enough to not touch anything if the
mapping function is equality?

> >Also, if we are going to maintain ordering and mapping (presumably via
> >the internal number that we're storing), then I think we should expose
> >that, at least optionally. So for example, you should be able to define
> >what a specific enum value means. Not everyone will want a linear
> >numbering starting at 0 afterall.
> > 
> >
> 
> What on earth for? Users should not care in the slightest what the 
> internal representation is . Users who want a map where the values are 
> exposed should create a lookup table.
> 
> You keep saying that we are using the internal representation as the 
> ordering. This is simply the wrong way to look at it. The internal 
> representation REFLECTS the ordering; it doesn't impose it. The user has 
> imposed the ordering when defining the type. In my enumkit I did provide 
> a function that gave back the internal representation, but I am not by 
> any means certain that that's a good idea.

Well, someone was arguing that enum should be used as a convenient way
to map human labels on a set of values. To me, that means you should be
able to define exactly what that set of values is.

Personally, I don't see why enum can't just be syntactic sugar on top of
a side-table of values and a foreign key. And I guess a view to hide the
internals from normal viewing. That would certainly allow the most
flexibility, although it probably wouldn't perform as well as what you
wrote.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: enums

From
Andrew Dunstan
Date:

Ted Rolle wrote:

> This little snippet is great!  The only problem I see is that the 
> enums must be consistent across all modules.
>
> What about loading a variable with a "default" value?  Then it could 
> be adjusted to 'play'.
>
>

You can set a default for a variable using one of these types, as you 
can for any other postgres type:
   create table bar (color rainbow default 'blue');   insert into bar values(default);

As for any postgres type, the default must be a valid value for the type.

cheers

andrew


Re: enums

From
Rod Taylor
Date:
> > It would just be a standard "ALTER TABLE foo ALTER COLUMN bar TYPE 
> > newtype USING expression" operation. You would write a function that 
> > took a value of the old type and returned a value of the new type and 
> > use a cll to that function in the expression. Since these would be named 
> > types, unlike the case in mysql where they are anonymously defined 
> > inline, this would present no difficulties at all.
> 
> But why force a re-write of the entire table just to change the name of
> something?
> 
> Or is ALTER COLUMN TYPE smart enough to not touch anything if the
> mapping function is equality?

Nearly all ALTER TABLE commands are processed in the same way. In fact,
in some cases they are combined (ADD column is broken down then
re-combined later for a single rewrite).

You could pretty easily add this type of logic to skip the rewrite stage
if not needed.

-- 



Re: enums

From
Andrew Dunstan
Date:

Jim C. Nasby wrote:

>On Thu, Oct 27, 2005 at 09:45:05PM -0400, Andrew Dunstan wrote:
>  
>
>>Jim C. Nasby wrote:
>>
>>    
>>
>>>Andrew, you mentioned that if you want to change the ordering you should
>>>just create a new type. What about if you need to change the values that
>>>are in the enum? MySQL does (or at least did, it's been some time since
>>>I've messed with this) a horrible job at that. There's no way to rename
>>>anything; you have to add the new names you want, then do a bulk update,
>>>then delete the (now old) names. IMO this is broken.
>>>
>>>
>>>      
>>>
>>It would just be a standard "ALTER TABLE foo ALTER COLUMN bar TYPE 
>>newtype USING expression" operation. You would write a function that 
>>took a value of the old type and returned a value of the new type and 
>>use a cll to that function in the expression. Since these would be named 
>>types, unlike the case in mysql where they are anonymously defined 
>>inline, this would present no difficulties at all.
>>    
>>
>
>But why force a re-write of the entire table just to change the name of
>something?
>  
>

Because you are not just changing the name of something.

You can't decide you want to insert a new digit between 3 and 4 for an 
integer column and just affect a few rows, rather than change the whole 
set of rows for that field. Nor change the order of the integers. An 
enumerated type is just like that, except that the values are labels 
instead of digit sequences.

>>
>>You keep saying that we are using the internal representation as the 
>>ordering. This is simply the wrong way to look at it. The internal 
>>representation REFLECTS the ordering; it doesn't impose it. The user has 
>>imposed the ordering when defining the type. In my enumkit I did provide 
>>a function that gave back the internal representation, but I am not by 
>>any means certain that that's a good idea.
>>    
>>
>
>Well, someone was arguing that enum should be used as a convenient way
>to map human labels on a set of values. To me, that means you should be
>able to define exactly what that set of values is.
>  
>


Well, that's not my conception at all. Then it is not an enumeration in 
my view.

>Personally, I don't see why enum can't just be syntactic sugar on top of
>a side-table of values and a foreign key. And I guess a view to hide the
>internals from normal viewing. That would certainly allow the most
>flexibility, although it probably wouldn't perform as well as what you
>wrote.
>  
>

The other issue is ease of use.

We used lookup tables in bugzilla when it was converted to work with 
Postgres. But many users will find having to do that annoying, to say 
the least. I think there's a very good case for providing true enums. 
There is a technical part of the puzzle I can't quite see yet,  though :-)

cheers

andrew




Re: enums

From
Rod Taylor
Date:
> The other issue is ease of use.
> 
> We used lookup tables in bugzilla when it was converted to work with 
> Postgres. But many users will find having to do that annoying, to say 
> the least. I think there's a very good case for providing true enums. 

Then why did you use lookup tables instead of a varchar and a
constraint? Probably performance.

A much more general purpose but just as good solution would be the
ability to create a hidden surrogate key for a structure.

CREATE TABLE status (code varchar(20) PRIMARY KEY) WITH SURROGATE;
CREATE TABLE account (name varchar(60), status varchar(20) references
status);

Behind the scenes (transparent to the user) this gets converted to:

CREATE TABLE status (id SERIAL UNIQUE, code varchar(20) PRIMARY KEY)
WITH SURROGATE;
CREATE TABLE account (name varchar(60), status integer references
status(id));


SELECT * FROM account; would be rewritten as
SELECT * FROM (SELECT name, code FROM account JOIN status USING (id)) AS
account;

Enum might be good for a short list of items but something like the
above should be good for any common value that we manually create
surrogate keys for today but without the clutter or the application
needing to know.

If PostgreSQL had an updatable view implementation it would be pretty
simple to implement.

-- 



Re: enums

From
Gregory Maxwell
Date:
On 10/27/05, Andrew Dunstan <andrew@dunslane.net> wrote:
> >That seems counter-intuitive. It's also exposing an implimentation
> >detail (that the enum is stored internally as a number).
>
> No it is not. Not in the slightest. It is honoring the enumeration order
> defined for the type. That is the ONLY correct behaviour, IMNSHO.
> Otherwise, you could just as easily use a domain with a check constraint.
>
> In fact, mysql's behaviour is laughably, even ludicrously, inconsistent:
[snip]
> So for "order by" it honors the enumeration order, but for < it uses the
> lexical ordering. Lovely, eh?

Oh wow. That is broken, I didn't try that case because I figured it
would do it right (i.e. use the enum order).


Re: enums

From
Gregory Maxwell
Date:
On 10/27/05, Andrew Dunstan <andrew@dunslane.net> wrote:
> >Yes, MySQL is broken in some regards, as usual. However, the API isn't
> >bad (except for the fact that it doesn't care what invalid crap you
> >throw at it), and more importantly there are thousands of apps and
> >developers who think around that interface. We should copy it without
> >the brokenness as much as possible unless we have good cause
> >otherwise.
> >
>
> mmm ... no. It is too broken. We should do enums orthogonally with other
> type definitions in PostgreSQL. Where I would like to get to is that we
> have a flavor of CREATE TYPE that will create the enum type for us,
> including all the support that I build into my little kit. And if you
> want to change the enumeration set on a column, you would use ALTER
> TABLE foo ALTER COLUMN  bar  TYPE newtype USING ...

eh, Well that we have a reasonable user extensiable type system is
reasonable reason.  What I was mostly objecting to was the use of
lexical collation the "don't mess with what people already expect"
argument was just the most handy strawman available. :)

And in doing so you could insert a enum in the middle of the existing
list without breaking the values already in the table?  If so that
would be very useful.

> Inline declarations of enums does not strike me as good.

You're right, it's a property of a type.


Re: enums

From
Andrew Dunstan
Date:

Gregory Maxwell wrote:

>And in doing so you could insert a enum in the middle of the existing
>list without breaking the values already in the table?  If so that
>would be very useful.
>
>  
>
You do it by altering the column type, not by altering the type itself.  
MySQL's way of doing this is made necessary by its horrid non-orthogonal 
way of doing enums. Here's how it works in PostgreSQL.  (To make this 
example work I had to add a text conversion - an inadvertant omission 
from the original. This is in a revised version of the enumkit, 
available at the same location.)


andrew=# create table foo (i serial, c rgb);
NOTICE:  CREATE TABLE will create implicit sequence "foo_i_seq" for 
serial column "foo.i"
CREATE TABLE
andrew=# insert into foo (c) values ('blue');
INSERT 8711471 1
andrew=# insert into foo (c) values ('green');
INSERT 8711472 1
andrew=# insert into foo (c) values ('red');
INSERT 8711473 1
andrew=# select * from foo order by c;i |   c  
---+-------3 | red2 | green1 | blue
(3 rows)

andrew=# insert into foo (c) values ('yellow');
ERROR:  invalid input value for enum: "yellow"
andrew=# alter table foo alter column c type rainbow using c::text;
ALTER TABLE
andrew=# select * from foo order by c;i |   c  
---+-------3 | red2 | green1 | blue
(3 rows)

andrew=# insert into foo (c) values ('yellow');
INSERT 8711477 1
andrew=# select * from foo order by c;i |   c   
---+--------3 | red4 | yellow2 | green1 | blue
(4 rows)


cheers

andrew


Re: enums

From
"Merlin Moncure"
Date:
Andrew wrote:
> > Jim C. Nasby wrote:
> >Personally, I don't see why enum can't just be syntactic sugar on top
of
> >a side-table of values and a foreign key. And I guess a view to hide
the
> >internals from normal viewing. That would certainly allow the most
> >flexibility, although it probably wouldn't perform as well as what
you
> >wrote.
> The other issue is ease of use.
>
> We used lookup tables in bugzilla when it was converted to work with
> Postgres. But many users will find having to do that annoying, to say
> the least. I think there's a very good case for providing true enums.
> There is a technical part of the puzzle I can't quite see yet,  though
:-)

Hm, I agree with Jim here.  IMO, enum=FK syntax sugar...enum should be a
lookup table with two fields, one being enum value which is the PK, and
two being the sequencing value.

I think many people are opposed to this approach because they assume
this relationship is via ID-ID link (IIRC this is what mysql does under
the hood).  In fact, the enum table's only purpose is for constraint
checking, not to lookup the value (there is no 'id').

I like the way sequences work.  They are first class SQL objects
although they are normally accessed via helper functions.  Enums could
be the same.  Dependancy could be preserved to the creating table or not
(I prefer not).

Merlin


Re: enums

From
Andrew Dunstan
Date:

Rod Taylor wrote:

>>The other issue is ease of use.
>>
>>We used lookup tables in bugzilla when it was converted to work with 
>>Postgres. But many users will find having to do that annoying, to say 
>>the least. I think there's a very good case for providing true enums. 
>>    
>>
>
>Then why did you use lookup tables instead of a varchar and a
>constraint? Probably performance.
>  
>

To be honest, I forget why. Possible because we also needed to be able 
to get a list of allowed values, although I don't know how one does that 
in mysql. Maybe because it just seemed like a good idea at the time and 
nobody spoke up against it.

>A much more general purpose but just as good solution would be the
>ability to create a hidden surrogate key for a structure.
>
>CREATE TABLE status (code varchar(20) PRIMARY KEY) WITH SURROGATE;
>CREATE TABLE account (name varchar(60), status varchar(20) references
>status);
>
>Behind the scenes (transparent to the user) this gets converted to:
>
>CREATE TABLE status (id SERIAL UNIQUE, code varchar(20) PRIMARY KEY)
>WITH SURROGATE;
>CREATE TABLE account (name varchar(60), status integer references
>status(id));
>
>
>SELECT * FROM account; would be rewritten as
>SELECT * FROM (SELECT name, code FROM account JOIN status USING (id)) AS
>account;
>
>Enum might be good for a short list of items but something like the
>above should be good for any common value that we manually create
>surrogate keys for today but without the clutter or the application
>needing to know.
>
>If PostgreSQL had an updatable view implementation it would be pretty
>simple to implement.
>
>  
>

That won't make it easier to change the ordering or the value set, which 
some people seem concerned about.

But it too might be a nice feature. I suspect it would be a lot more 
work than simple enums, for which there is significant demand.

cheers

andrew


Re: enums

From
"Jim C. Nasby"
Date:
On Thu, Oct 27, 2005 at 11:07:19PM -0400, Rod Taylor wrote:
> > The other issue is ease of use.
> > 
> > We used lookup tables in bugzilla when it was converted to work with 
> > Postgres. But many users will find having to do that annoying, to say 
> > the least. I think there's a very good case for providing true enums. 
> 
> Then why did you use lookup tables instead of a varchar and a
> constraint? Probably performance.
> 
> A much more general purpose but just as good solution would be the
> ability to create a hidden surrogate key for a structure.
> 
> CREATE TABLE status (code varchar(20) PRIMARY KEY) WITH SURROGATE;
> CREATE TABLE account (name varchar(60), status varchar(20) references
> status);
> 
> Behind the scenes (transparent to the user) this gets converted to:
> 
> CREATE TABLE status (id SERIAL UNIQUE, code varchar(20) PRIMARY KEY)
> WITH SURROGATE;
> CREATE TABLE account (name varchar(60), status integer references
> status(id));
> 
> 
> SELECT * FROM account; would be rewritten as
> SELECT * FROM (SELECT name, code FROM account JOIN status USING (id)) AS
> account;
> 
> Enum might be good for a short list of items but something like the
> above should be good for any common value that we manually create
> surrogate keys for today but without the clutter or the application
> needing to know.
> 
> If PostgreSQL had an updatable view implementation it would be pretty
> simple to implement.

I'm not quiet following the WITH SURROGATE bit, but what you've
described certainly looks valuable. Note that I would still want to be
able to get at the raw numeric values in some fasion.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: enums

From
"Jim C. Nasby"
Date:
On Thu, Oct 27, 2005 at 10:34:57PM -0400, Andrew Dunstan wrote:
> 
> 
> Jim C. Nasby wrote:
> 
> >On Thu, Oct 27, 2005 at 09:45:05PM -0400, Andrew Dunstan wrote:
> > 
> >
> >>Jim C. Nasby wrote:
> >>
> >>   
> >>
> >>>Andrew, you mentioned that if you want to change the ordering you should
> >>>just create a new type. What about if you need to change the values that
> >>>are in the enum? MySQL does (or at least did, it's been some time since
> >>>I've messed with this) a horrible job at that. There's no way to rename
> >>>anything; you have to add the new names you want, then do a bulk update,
> >>>then delete the (now old) names. IMO this is broken.
> >>>
> >>>
> >>>     
> >>>
> >>It would just be a standard "ALTER TABLE foo ALTER COLUMN bar TYPE 
> >>newtype USING expression" operation. You would write a function that 
> >>took a value of the old type and returned a value of the new type and 
> >>use a cll to that function in the expression. Since these would be named 
> >>types, unlike the case in mysql where they are anonymously defined 
> >>inline, this would present no difficulties at all.
> >>   
> >>
> >
> >But why force a re-write of the entire table just to change the name of
> >something?
> > 
> >
> 
> Because you are not just changing the name of something.

No, I was refering specifically to the case of wanting to rename
something. IE: you setup an enum for sky colors (blue, black), and then
the PHB issues an edict that the daytime sky is now green. In this case
you (or at least I) don't want to define a new enum, I just want to
change 'blue' to 'green' in that enum. There's no reason it needs to hit
the table at all.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: enums

From
Rod Taylor
Date:
On Fri, 2005-10-28 at 13:20 -0500, Jim C. Nasby wrote:
> On Thu, Oct 27, 2005 at 11:07:19PM -0400, Rod Taylor wrote:
> > > The other issue is ease of use.
> > > 
> > > We used lookup tables in bugzilla when it was converted to work with 
> > > Postgres. But many users will find having to do that annoying, to say 
> > > the least. I think there's a very good case for providing true enums. 
> > 
> > Then why did you use lookup tables instead of a varchar and a
> > constraint? Probably performance.
> > 
> > A much more general purpose but just as good solution would be the
> > ability to create a hidden surrogate key for a structure.
> > 
> > CREATE TABLE status (code varchar(20) PRIMARY KEY) WITH SURROGATE;
> > CREATE TABLE account (name varchar(60), status varchar(20) references
> > status);
> > 
> > Behind the scenes (transparent to the user) this gets converted to:
> > 
> > CREATE TABLE status (id SERIAL UNIQUE, code varchar(20) PRIMARY KEY)
> > WITH SURROGATE;
> > CREATE TABLE account (name varchar(60), status integer references
> > status(id));
> > 
> > 
> > SELECT * FROM account; would be rewritten as
> > SELECT * FROM (SELECT name, code FROM account JOIN status USING (id)) AS
> > account;
> > 
> > Enum might be good for a short list of items but something like the
> > above should be good for any common value that we manually create
> > surrogate keys for today but without the clutter or the application
> > needing to know.
> > 
> > If PostgreSQL had an updatable view implementation it would be pretty
> > simple to implement.
> 
> I'm not quiet following the WITH SURROGATE bit, but what you've
> described certainly looks valuable. Note that I would still want to be
> able to get at the raw numeric values in some fasion.

The basic idea is that most of us break out schemas by creating fake
primary keys for the purpose of obtaining performance because using the
proper primary key (single or multiple columns) is often very slow.

The automatic and transparent creation of a surrogate key by PostgreSQL
would allow us to dramatically clean up the presentation of our schema
to the users using the database without the performance hit we currently
get.


It puts surrogate keys (fake primary keys) back to the level of table
spaces, indexes and other performance enhancements where they belong.

-- 



Re: enums

From
Andrew Dunstan
Date:

Jim C. Nasby wrote:

>>>>        
>>>>
>>>But why force a re-write of the entire table just to change the name of
>>>something?
>>>
>>>
>>>      
>>>
>>Because you are not just changing the name of something.
>>    
>>
>
>No, I was refering specifically to the case of wanting to rename
>something. IE: you setup an enum for sky colors (blue, black), and then
>the PHB issues an edict that the daytime sky is now green. In this case
>you (or at least I) don't want to define a new enum, I just want to
>change 'blue' to 'green' in that enum. There's no reason it needs to hit
>the table at all.
>  
>

Well, with enumkit you can't, because the values are hardwired in the 
.so file. With a builtin facility you would be able to, because the 
values would live in the catalog. However, hacking the catalog is not 
something I would encourage - what you are suggesting basically breaks 
the abstraction. But sure, it would be possible. I would not provide an 
SQL level facility to do it, though. My approved way to do it would be 
like the example I gave earlier.

cheers

andrew


Re: enums

From
"Jim C. Nasby"
Date:
On Fri, Oct 28, 2005 at 02:57:03PM -0400, Rod Taylor wrote:
> The basic idea is that most of us break out schemas by creating fake
> primary keys for the purpose of obtaining performance because using the
> proper primary key (single or multiple columns) is often very slow.
> 
> The automatic and transparent creation of a surrogate key by PostgreSQL
> would allow us to dramatically clean up the presentation of our schema
> to the users using the database without the performance hit we currently
> get.
> 
> 
> It puts surrogate keys (fake primary keys) back to the level of table
> spaces, indexes and other performance enhancements where they belong.

Ahh. Yes, that would definately be great to have. Although it would
probably take me months if not years to get used to not seeing a bunch
of _id fields laying all over the place...

Is SURROGATE part of any of the ANSI specs?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: enums

From
"Jim C. Nasby"
Date:
On Fri, Oct 28, 2005 at 04:12:01PM -0400, Andrew Dunstan wrote:
> Well, with enumkit you can't, because the values are hardwired in the 
> .so file. With a builtin facility you would be able to, because the 
> values would live in the catalog. However, hacking the catalog is not 
> something I would encourage - what you are suggesting basically breaks 
> the abstraction. But sure, it would be possible. I would not provide an 
> SQL level facility to do it, though. My approved way to do it would be 
> like the example I gave earlier.

Why not allow renaming though? It seems like a logical feature to have,
and an easy one to add. What am I missing?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: enums

From
Andrew Dunstan
Date:

Jim C. Nasby wrote:

>On Fri, Oct 28, 2005 at 04:12:01PM -0400, Andrew Dunstan wrote:
>  
>
>>Well, with enumkit you can't, because the values are hardwired in the 
>>.so file. With a builtin facility you would be able to, because the 
>>values would live in the catalog. However, hacking the catalog is not 
>>something I would encourage - what you are suggesting basically breaks 
>>the abstraction. But sure, it would be possible. I would not provide an 
>>SQL level facility to do it, though. My approved way to do it would be 
>>like the example I gave earlier.
>>    
>>
>
>Why not allow renaming though? It seems like a logical feature to have,
>and an easy one to add. What am I missing?
>  
>

That it is not changing a name, but a value. It's roughly the equivalent 
of inserting a new digit between 3 and 4. Your "feature" breaks the 
abstraction I am trying to implement.

cheers

andrew


Re: enums

From
Rod Taylor
Date:
On Fri, 2005-10-28 at 15:21 -0500, Jim C. Nasby wrote:
> On Fri, Oct 28, 2005 at 02:57:03PM -0400, Rod Taylor wrote:
> > The basic idea is that most of us break out schemas by creating fake
> > primary keys for the purpose of obtaining performance because using the
> > proper primary key (single or multiple columns) is often very slow.
> > 
> > The automatic and transparent creation of a surrogate key by PostgreSQL
> > would allow us to dramatically clean up the presentation of our schema
> > to the users using the database without the performance hit we currently
> > get.
> > 
> > 
> > It puts surrogate keys (fake primary keys) back to the level of table
> > spaces, indexes and other performance enhancements where they belong.
> 
> Ahh. Yes, that would definately be great to have. Although it would
> probably take me months if not years to get used to not seeing a bunch
> of _id fields laying all over the place...
> 
> Is SURROGATE part of any of the ANSI specs?

No, but neither is an index, rollback segment, or table space.  The ANSI
spec doesn't usually deal with performance tweaks that are the
responsibility of the DBA.

-- 



Re: enums

From
"Jim C. Nasby"
Date:
On Fri, Oct 28, 2005 at 04:36:26PM -0400, Rod Taylor wrote:
> On Fri, 2005-10-28 at 15:21 -0500, Jim C. Nasby wrote:
> > On Fri, Oct 28, 2005 at 02:57:03PM -0400, Rod Taylor wrote:
> > > The basic idea is that most of us break out schemas by creating fake
> > > primary keys for the purpose of obtaining performance because using the
> > > proper primary key (single or multiple columns) is often very slow.
> > > 
> > > The automatic and transparent creation of a surrogate key by PostgreSQL
> > > would allow us to dramatically clean up the presentation of our schema
> > > to the users using the database without the performance hit we currently
> > > get.
> > > 
> > > 
> > > It puts surrogate keys (fake primary keys) back to the level of table
> > > spaces, indexes and other performance enhancements where they belong.
> > 
> > Ahh. Yes, that would definately be great to have. Although it would
> > probably take me months if not years to get used to not seeing a bunch
> > of _id fields laying all over the place...
> > 
> > Is SURROGATE part of any of the ANSI specs?
> 
> No, but neither is an index, rollback segment, or table space.  The ANSI
> spec doesn't usually deal with performance tweaks that are the
> responsibility of the DBA.

True, but none of those other things you mention affect external
representation of data. But I was more wondering if we were inventing
syntax on the fly here or not...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: enums

From
Rod Taylor
Date:
On Fri, 2005-10-28 at 16:28 -0500, Jim C. Nasby wrote:
> On Fri, Oct 28, 2005 at 04:36:26PM -0400, Rod Taylor wrote:
> > On Fri, 2005-10-28 at 15:21 -0500, Jim C. Nasby wrote:
> > > On Fri, Oct 28, 2005 at 02:57:03PM -0400, Rod Taylor wrote:
> > > > The basic idea is that most of us break out schemas by creating fake
> > > > primary keys for the purpose of obtaining performance because using the
> > > > proper primary key (single or multiple columns) is often very slow.
> > > > 
> > > > The automatic and transparent creation of a surrogate key by PostgreSQL
> > > > would allow us to dramatically clean up the presentation of our schema
> > > > to the users using the database without the performance hit we currently
> > > > get.
> > > > 
> > > > 
> > > > It puts surrogate keys (fake primary keys) back to the level of table
> > > > spaces, indexes and other performance enhancements where they belong.
> > > 
> > > Ahh. Yes, that would definately be great to have. Although it would
> > > probably take me months if not years to get used to not seeing a bunch
> > > of _id fields laying all over the place...
> > > 
> > > Is SURROGATE part of any of the ANSI specs?
> > 
> > No, but neither is an index, rollback segment, or table space.  The ANSI
> > spec doesn't usually deal with performance tweaks that are the
> > responsibility of the DBA.
> 
> True, but none of those other things you mention affect external
> representation of data. But I was more wondering if we were inventing
> syntax on the fly here or not...

It isn't supposed to impact the external representation of the data and
generally neither is an ENUM outside of the potential sorting ability. I
was just getting the impression that the big push for enums was to be
able to use a 'real word' but without a performance hit.

A regular old table, foreign key to a varchar gives you the 'real word'
and the surrogate key allows you to do so without a performance hit.

-- 



Re: enums

From
"Jim C. Nasby"
Date:
On Fri, Oct 28, 2005 at 06:10:26PM -0400, Rod Taylor wrote:
> It isn't supposed to impact the external representation of the data and
> generally neither is an ENUM outside of the potential sorting ability. I
> was just getting the impression that the big push for enums was to be
> able to use a 'real word' but without a performance hit.
> 
> A regular old table, foreign key to a varchar gives you the 'real word'
> and the surrogate key allows you to do so without a performance hit.

I think there's probably good use cases for each. If you've got
something small like a status field, 'enum' might be better. For bigger
things, SURROGATE could be nice syntactic sugar.

Now that I finally understand what Andrew's been getting at with enums,
I'm wondering if we might want to expand on the typical usage a bit.
Looking at a plain-old C enum, you're just representing some magic
labels with a number to save space. Things like say, SLRU_PAGE_CLEAN,
SLRU_PAGE_READ_IN_PROGRESS, SLRU_PAGE_CLEAN. Those names are great from
a code standpoint, but they're not something you'd typically want to
display to the user. So, imho a useful extension would be to allow for
enums to contain both the 'machine name' and a 'human name', where the
human name could be renamed freely. To put this in a more concrete
example; I hate the default priorities that ship with bugzilla; P1 - P5.
Is 1 high or is 5? So I always rename them to Very Low, Low ... Very
High. That means making changes both to the database and to the code.
But if Bugzilla was using my idea of an enum then the code would refer
to priorities with P1...P5 (or whatever else they wanted to call it) and
I could easily change the human names to something that can't be
confused.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Surrogate keys (Was: enums)

From
Leandro Guimarães Faria Corcete DUTRA
Date:
Rod Taylor <pg <at> rbt.ca> writes:

> The basic idea is that most of us break out schemas by creating fake
> primary keys for the purpose of obtaining performance because using the
> proper primary key (single or multiple columns) is often very slow.

This is one thing I simply can't understand.

If you still declare the natural key(s) as UNIQUEs, you have just made 
performance worse.  Now there are two keys to be checked on UPDATEs and 
INSERTs, two indexes to be updated, and probably a SEQUENCE too.

If you don't, you have just thrown away centralised, optimised integrity 
checking, and will probably have to remember to do a slower SELECT before 
updating.

Certainly decoupling presentation from storage would be nice, but even before 
that generalised use of surrogate keys seems to me a knee-jerk reaction.



Re: Surrogate keys (Was: enums)

From
Michael Glaesemann
Date:
On Jan 13, 2006, at 21:42 , Leandro Guimarães Faria Corcete DUTRA wrote:

> If you still declare the natural key(s) as UNIQUEs, you have just made
> performance worse.  Now there are two keys to be checked on UPDATEs
> and
> INSERTs, two indexes to be updated, and probably a SEQUENCE too.

For UPDATEs and INSERTs, the "proper" primary key also needs to be
checked, but keys are used for more than just checking uniqueness:
they're also often used in JOINs. Joining against a single integer
I'd think it quite a different proposition (I'd think faster in terms
of performance) than joining against, say, a text column or a
composite key.

Michael Glaesemann
grzm myrealbox com





Re: Surrogate keys (Was: enums)

From
Lukas Smith
Date:
Michael Glaesemann wrote:
> 
> On Jan 13, 2006, at 21:42 , Leandro Guimarães Faria Corcete DUTRA wrote:
> 
>> If you still declare the natural key(s) as UNIQUEs, you have just made
>> performance worse.  Now there are two keys to be checked on UPDATEs and
>> INSERTs, two indexes to be updated, and probably a SEQUENCE too.
> 
> For UPDATEs and INSERTs, the "proper" primary key also needs to be 
> checked, but keys are used for more than just checking uniqueness: 
> they're also often used in JOINs. Joining against a single integer I'd 
> think it quite a different proposition (I'd think faster in terms of 
> performance) than joining against, say, a text column or a composite key.

Well this is a balancing decision. You certainly slow down inserts. You 
might also increase the stress on the table because you have to 
translate between the different keys. It also depends on the join type 
you end up doing. It also obviously depends on how large your original 
primary key is. However whatever your situation is: make sure you do not 
end up doing premature optimization.

regards,
Lukas


Re: Surrogate keys (Was: enums)

From
Rod Taylor
Date:
On Fri, 2006-01-13 at 12:42 +0000, Leandro Guimarães Faria Corcete DUTRA
wrote:
> Rod Taylor <pg <at> rbt.ca> writes:
>
> > The basic idea is that most of us break out schemas by creating fake
> > primary keys for the purpose of obtaining performance because using the
> > proper primary key (single or multiple columns) is often very slow.
>
> This is one thing I simply can't understand.
>
> If you still declare the natural key(s) as UNIQUEs, you have just made
> performance worse.  Now there are two keys to be checked on UPDATEs and
> INSERTs, two indexes to be updated, and probably a SEQUENCE too.

Indeed. Using a surrogate key is not free and that is why it would be
something the DBA would specify during table creation.

The main goal would be to give the option of using a surrogate key
without being forced to expose it to the applications using the
database. It is a feature akin to table spaces in that it can help
performance but without the application or standard users knowing why.

--



Re: Surrogate keys (Was: enums)

From
Lukas Smith
Date:
Rod Taylor wrote:

>> If you still declare the natural key(s) as UNIQUEs, you have just made 
>> performance worse.  Now there are two keys to be checked on UPDATEs and 
>> INSERTs, two indexes to be updated, and probably a SEQUENCE too.
> 
> Indeed. Using a surrogate key is not free and that is why it would be
> something the DBA would specify during table creation.
> 
> The main goal would be to give the option of using a surrogate key
> without being forced to expose it to the applications using the
> database. It is a feature akin to table spaces in that it can help
> performance but without the application or standard users knowing why.

Just this morning my father came to me (he is getting into SQL now that 
he is retired) with an issue where a surrogate key probably makes sense. 
He is storing a tree of plant families that can get fairly deep. The 
primary key is the scientific name. In order to improve performance and 
get rid of the recursive lookups he currently does he now wants to use 
materialized paths (<parent name>/<sub name>/<sub sub name>). He decided 
not to go with nested paths since that makes it very hard to hand fix 
things in the tree structure. Obviously using the scientific name in the 
materialized paths can quickly give you a really wide column if you have 
a fairly deep tree. In that case it could be beneficial to introduce a 
surrogate key.

The only annoying bit is that he frequently needs to sync with an 
external database where they use no surrogate key so the import slows 
down because he needs to check if a surrogate key has been introduced 
for every given scientific name before writing to the database.

regards,
Lukas


Re: Surrogate keys (Was: enums)

From
mark@mark.mielke.cc
Date:
On Fri, Jan 13, 2006 at 12:42:55PM +0000, Leandro Guimarães Faria Corcete DUTRA wrote:
> Rod Taylor <pg <at> rbt.ca> writes:
> > The basic idea is that most of us break out schemas by creating fake
> > primary keys for the purpose of obtaining performance because using the
> > proper primary key (single or multiple columns) is often very slow.
> This is one thing I simply can't understand.

> If you still declare the natural key(s) as UNIQUEs, you have just made 
> performance worse.  Now there are two keys to be checked on UPDATEs and 
> INSERTs, two indexes to be updated, and probably a SEQUENCE too.

Not to completely defend the practice - but in some applications,
INSERT is much less frequent than UPDATE, and that UPDATE requires a
unique check on the primary key and the surrogate key, as well as an
update, should be considered (and I believe is considered) a
PostgreSQL performance bug. It's undesirable and unnecessary behaviour
for the majority of uses (where they key does not change as a part of
the update).

> Certainly decoupling presentation from storage would be nice, but even before
> that generalised use of surrogate keys seems to me a knee-jerk reaction.

Yes, I agree. As per a previous thread, I'm one of those using it to
generalize my query / update implementation into common base code. I
have other reasons - but I confess to this being the real reason.

In my case, the cost of maintaining the code that queries / updates is
more expensive than the cost of having an extra unique index, and the
storage and performance impacts this has on my data. :-)

Is my primary reason good on its own, without the other more legitimate
justifications? It's good enough for me. I expect others to strongly
disagree.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: Surrogate keys (Was: enums)

From
Martijn van Oosterhout
Date:
On Sat, Jan 14, 2006 at 11:06:07AM -0500, mark@mark.mielke.cc wrote:
> Not to completely defend the practice - but in some applications,
> INSERT is much less frequent than UPDATE, and that UPDATE requires a
> unique check on the primary key and the surrogate key, as well as an
> update, should be considered (and I believe is considered) a
> PostgreSQL performance bug. It's undesirable and unnecessary behaviour
> for the majority of uses (where they key does not change as a part of
> the update).

Unique check? An index is an index and when you do an UPDATE the new
tuple has to be added to the index. At this point it doesn't matter if
the index is unique or not, all indexes cost something.

Since after the UPDATE the tuple with that primary key appears two (or
more) times in the table, a check needs to be made that they don't
overlap timewise. Are you claiming you could avoid this check and still
guarentee correctness in the face of concurrent transactions?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Surrogate keys (Was: enums)

From
mark@mark.mielke.cc
Date:
On Sat, Jan 14, 2006 at 09:33:39PM +0100, Martijn van Oosterhout wrote:
> On Sat, Jan 14, 2006 at 11:06:07AM -0500, mark@mark.mielke.cc wrote:
> > Not to completely defend the practice - but in some applications,
> > INSERT is much less frequent than UPDATE, and that UPDATE requires a
> > unique check on the primary key and the surrogate key, as well as an
> > update, should be considered (and I believe is considered) a
> > PostgreSQL performance bug. It's undesirable and unnecessary behaviour
> > for the majority of uses (where they key does not change as a part of
> > the update).
> Unique check? An index is an index and when you do an UPDATE the new
> tuple has to be added to the index. At this point it doesn't matter if
> the index is unique or not, all indexes cost something.

> Since after the UPDATE the tuple with that primary key appears two (or
> more) times in the table, a check needs to be made that they don't
> overlap timewise. Are you claiming you could avoid this check and still
> guarentee correctness in the face of concurrent transactions?

I'm claiming that I agree with this TODO item:
   - Prevent index uniqueness checks when UPDATE does not modify the column
         Uniqueness (index) checks are done when updating a column         even if the column is not modified by the
UPDATE.

Definately, the check is unnecessary. If it was unique before we made
the change, we know it will be unique after we've made the change.
The check shouldn't be performed for the primary key, or for the
surrogate key, if neither of these keys are modified in any way.

Perhaps you are challenging my addition of the phrase "as well as an
update", with a hint on my part, that I feel the update is unnecessary
as well. I may have been wrong to add these 5 words. The MVCC
implementation has numerous costs, and perhaps this is one of them
that cannot be avoided. :-(

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: Surrogate keys (Was: enums)

From
Greg Stark
Date:
Leandro Guimarães Faria Corcete DUTRA <leandro@dutra.fastmail.fm> writes:

> Certainly decoupling presentation from storage would be nice, but even before 
> that generalised use of surrogate keys seems to me a knee-jerk reaction.

I hate knee-jerk reactions too, but just think of all the pain of people
dealing with databases where they used Social Security numbers for primary
keys. I would never use an attribute that represents some real-world datum as
a primary key any more.

In my experience there are very few occasions where I want a real non-sequence
generated primary key. I've never regretted having a sequence generated
primary key, and I've certainly had occasions to regret not having one.

-- 
greg



Re: Surrogate keys (Was: enums)

From
"Jim C. Nasby"
Date:
On Sat, Jan 14, 2006 at 07:28:21PM +0900, Michael Glaesemann wrote:
> 
> On Jan 13, 2006, at 21:42 , Leandro Guimar?es Faria Corcete DUTRA wrote:
> 
> >If you still declare the natural key(s) as UNIQUEs, you have just made
> >performance worse.  Now there are two keys to be checked on UPDATEs  
> >and
> >INSERTs, two indexes to be updated, and probably a SEQUENCE too.
> 
> For UPDATEs and INSERTs, the "proper" primary key also needs to be  
> checked, but keys are used for more than just checking uniqueness:  
> they're also often used in JOINs. Joining against a single integer  
> I'd think it quite a different proposition (I'd think faster in terms  
> of performance) than joining against, say, a text column or a  
> composite key.

a) the optimizer does a really poor job on multi-column index statistics
b) If each parent record will have many children, the space savings from
using a surrogate key can be quite large
c) depending on how you view things, putting actual keys all over the
place is denormalized

Generally, I just use surrogate keys for everything unless performance
dictates something else.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Surrogate keys (Was: enums)

From
Leandro Guimarães Faria Corcete DUTRA
Date:
Jim C. Nasby <jnasby <at> pervasive.com> writes:

> a) the optimizer does a really poor job on multi-column index statistics

So it should be fixed?

And there are a *lot* of singular, natural keys.


> b) If each parent record will have many children, the space savings from
> using a surrogate key can be quite large

Not such a common case.


> c) depending on how you view things, putting actual keys all over the
> place is denormalized

How come?  Never!


> Generally, I just use surrogate keys for everything unless performance
> dictates something else.

What I am proposing is the reverse: use natural keys for everything unless 
performance dictates something else.

In support of my PoV: 
http://blogs.ittoolbox.com/database/soup/archives/007327.asp?rss=1




Re: Surrogate keys (Was: enums)

From
Leandro Guimarães Faria Corcete DUTRA
Date:
Greg Stark <gsstark <at> mit.edu> writes:

> I hate knee-jerk reactions too, but just think of all the pain of people
> dealing with databases where they used Social Security numbers for primary
> keys. I would never use an attribute that represents some real-world datum as
> a primary key any more.

I am not familiar with the situation.


> In my experience there are very few occasions where I want a real non-sequence
> generated primary key. I've never regretted having a sequence generated
> primary key, and I've certainly had occasions to regret not having one.

http://blogs.ittoolbox.com/database/soup/archives/007327.asp?rss=1




Re: enums

From
Leandro Guimarães Faria Corcete DUTRA
Date:
Andrew Dunstan <andrew <at> dunslane.net> writes:

> If people would like to play, I have created a little kit to help in 
> creating first class enum types in a few seconds.

Isn't what we actually want possreps?



Re: Surrogate keys (Was: enums)

From
Martijn van Oosterhout
Date:
On Wed, Jan 18, 2006 at 01:08:53PM +0000, Leandro Guimarães Faria Corcete DUTRA wrote:
> Jim C. Nasby <jnasby <at> pervasive.com> writes:
> > Generally, I just use surrogate keys for everything unless performance
> > dictates something else.
>
> What I am proposing is the reverse: use natural keys for everything unless
> performance dictates something else.
>
> In support of my PoV:
> http://blogs.ittoolbox.com/database/soup/archives/007327.asp?rss=1

Interesting. However, in my experience very few things have "natural
keys". There are no combination of attributes for people, phone calls
or even real events that make useful natural keys.

You don't say what the primary key on your events table was but I can
see one possibility:

(place,datetime)

A unique on this won't prevent overlapping events. Sure, it'll get rid
of the obvious duplicates but won't solve the problem. It also fails
the criteria that keys stable, since you can move events. You do need a
constraint on that table, but a unique constraint isn't it.

While I agree with your statement that it's the abuse of these keys
thats the problem, I find people are far too likely to see natural keys
where none exist.

BTW, the way I deal with people mixing up surrogate keys is by (usually
by chance) having the sequences for different tables start at wildly
different points. By starting one counter at a million and the other at
one, the chances that you'll be able to mix them up is reduced. On some
systems I can even identify the table a key comes from by looking at the
number, just because I know only one table has keys in the 30,000
range.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: enums

From
Andrew Dunstan
Date:

Leandro Guimarães Faria Corcete DUTRA wrote:

>Andrew Dunstan <andrew <at> dunslane.net> writes:
>
>  
>
>>If people would like to play, I have created a little kit to help in 
>>creating first class enum types in a few seconds.
>>    
>>
>
>Isn't what we actually want possreps?
>
>
>
>  
>

You appear to be responding to mail from months ago. Please catch up 
before replying, so we don't rehash old discussions. As previously 
discussed, I intend to do first class enums for the next release of 
postgres, if I get enough time. Enumkit was just a very small step along 
the research road, although it is useful in itself, which is why I 
released it.

cheers

andrew


Re: Surrogate keys (Was: enums)

From
Michael Glaesemann
Date:
On Jan 18, 2006, at 22:08 , Leandro Guimarães Faria Corcete DUTRA wrote:

> Jim C. Nasby <jnasby <at> pervasive.com> writes:
>
>> a) the optimizer does a really poor job on multi-column index
>> statistics
>
> So it should be fixed?

Of course! Patches welcome!

Michael Glaesemann
grzm myrealbox com





Re: Surrogate keys (Was: enums)

From
Greg Stark
Date:
Leandro Guimarães Faria Corcete DUTRA <leandro@dutra.fastmail.fm> writes:

> Greg Stark <gsstark <at> mit.edu> writes:
> 
> > I hate knee-jerk reactions too, but just think of all the pain of people
> > dealing with databases where they used Social Security numbers for primary
> > keys. I would never use an attribute that represents some real-world datum as
> > a primary key any more.
> 
> I am not familiar with the situation.

The US gov't handed out unique numbers to every worker for their old age
pension program. Many early database designers thought that made a wonderful
natural primary key.

It turns out that:

a) not everyone has a social insurance number: when their business expanded to
include foreign nationals these databases had to make up fake social insurance
numbers.

b) Occasionally people's social insurance numbers change, either because they
got it wrong in the first place or because of identity theft later on. Even
dealing with it changing isn't good enough because the old records don't
disappear; the person essentially has *two* social insurance numbers.

c) For security reasons it turns out to be a bad idea to be passing around
social insurance numbers in the first place. So these database designers had a
major problem adapting when people started refusing to give them social
insurance numbers or complaining when their application leaked their social
insurance number.

In short, what seemed like the clearest possible example of a natural primary
key became a great example of how hard it is to deal with changing business
requirements when you've tied your database design to the old rules. Using
natural primary keys makes an iron-clad design assumption that the business
rules surrounding that datum will never change. And the one thing constant in
business is that business rules change.

In the past I've used "username" as a primary key for a users table, what
could be safer? 

Later we had to create a sequence generated userid column because some data
partners couldn't handle an text column without corrupting it. And of course
one day the question arose whether we could handle someone wanting to change
their username. Then another day we were asked whether we could have two
different people with the same username if they belonged to separate branded
subsites.



-- 
greg



Re: Surrogate keys (Was: enums)

From
"Jim C. Nasby"
Date:
On Wed, Jan 18, 2006 at 01:08:53PM +0000, Leandro Guimar??es Faria Corcete DUTRA wrote:
> > b) If each parent record will have many children, the space savings from
> > using a surrogate key can be quite large
> 
> Not such a common case.
Hmmm...

Many blog entries per user... Many blog comments per entry

Many PO's per customer... many line items per PO...
Etc., etc. I would argue that one-many relationships are far more common
than one-one, and it's very common for an integer ID to be a more
compact representation than a real key.

> > c) depending on how you view things, putting actual keys all over the
> > place is denormalized
> 
> How come?  Never!
Huh?

One of the tenants of normalization is that you don't repeat data. You
don't use customer name in your PO table, because it's asking for
problems; what if a customer changes names (as just one example).

> > Generally, I just use surrogate keys for everything unless performance
> > dictates something else.
> 
> What I am proposing is the reverse: use natural keys for everything unless 
> performance dictates something else.
> 
> In support of my PoV: 
> http://blogs.ittoolbox.com/database/soup/archives/007327.asp?rss=1

Read the bottom of it:

"I am not saying that you should avoid autonumber surrogate keys like an
SCO executive. The danger is not in their use but in their abuse. The
"events_id" column in the "events" table didn't give us any trouble
until we began to rely on it as the sole key for the table. The
accounting application gave us problems because we were using the ID as
the entire handle for the records. That crossed the line from use to
misuse, and we suffered for it."

To paraphrase, the issue isn't that surrogate keys were used for RI; the
issue is that proper keys were not setup to begin with. Does it make
sense to have a customer table where customer_name isn't unique? Almost
certainly not. But that's just one possible constraint you might put on
that table. To put words in Josh's mouth, the issue isn't with using a
surrogate key, it's with not thinking about what constraints you should
be placing on your data.

Take a look at cbk's comment; he does a great job of summing the issue
up.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Surrogate keys (Was: enums)

From
Josh Berkus
Date:
Martjin,

> Interesting. However, in my experience very few things have "natural
> keys". There are no combination of attributes for people, phone calls
> or even real events that make useful natural keys.

I certainly hope that I never have to pick up one of your projects.   A 
table without a natural key is a data management disaster.   Without a 
key, it's not data, it's garbage.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco


Re: Surrogate keys (Was: enums)

From
Martijn van Oosterhout
Date:
On Wed, Jan 18, 2006 at 03:58:50PM -0800, Josh Berkus wrote:
> Martjin,
>
> > Interesting. However, in my experience very few things have "natural
> > keys". There are no combination of attributes for people, phone calls
> > or even real events that make useful natural keys.
>
> I certainly hope that I never have to pick up one of your projects.   A
> table without a natural key is a data management disaster.   Without a
> key, it's not data, it's garbage.

???

Please provides natural keys for any of the following:

- A Person
- A phone call: (from,to,date,time,duration) is not enough
- A physical address
- A phone line: (phone numbers arn't unique over time)
- An internet account: (usernames not unique over time either)

In any of these either misspellings, changes of names, ownership or
even structure over time render the obvious useless as keys. There are
techniques for detecting and reducing duplication but the point is that
for any of these duplicates *can* be valid data.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Surrogate keys (Was: enums)

From
Josh Berkus
Date:
Martjin,

> In any of these either misspellings, changes of names, ownership or
> even structure over time render the obvious useless as keys. There are
> techniques for detecting and reducing duplication but the point is that
> for any of these duplicates *can* be valid data.

Please point me out where, in the writings of E.F. Codd or in the SQL 
Standard, it says that keys have to be immutable for the life of the row.

Duplicate *values* can be valid data.  Duplicate *tuples* show some 
serious flaws in your database design.  If you have a personnel 
directory on which you've not bothered to define any unique constraints 
other than the ID column, then you can't match your data to reality.  If 
you have two rows with the same first and last name, you don't know if 
they are two different people or the same person, duplicated.  Which 
will be a big problem come paycheck time.

Per E.F. Codd, each tuple is a *unique* predicate (or key) comprising a 
set of values definining a *unique* data entity.  i.e. "The employeee 
named "John" "Little" at extension "4531".  There is nothing anywhere 
said about keys never changing.

This is Databases 101 material.  Really!

--Josh



Re: Surrogate keys (Was: enums)

From
Martijn van Oosterhout
Date:
On Thu, Jan 19, 2006 at 10:09:26AM -0800, Josh Berkus wrote:
> Martjin,
>
> >In any of these either misspellings, changes of names, ownership or
> >even structure over time render the obvious useless as keys. There are
> >techniques for detecting and reducing duplication but the point is that
> >for any of these duplicates *can* be valid data.
>
> Please point me out where, in the writings of E.F. Codd or in the SQL
> Standard, it says that keys have to be immutable for the life of the row.

Possibly nowhere. But when you send invoices to customers, any details
on there *are* immutable. Sure, in your database you don't care if
things change, but then they don't match reality anymore do they?

> Duplicate *values* can be valid data.  Duplicate *tuples* show some
> serious flaws in your database design.  If you have a personnel
> directory on which you've not bothered to define any unique constraints
> other than the ID column, then you can't match your data to reality.  If
> you have two rows with the same first and last name, you don't know if
> they are two different people or the same person, duplicated.  Which
> will be a big problem come paycheck time.

I never said there were duplicate tuples, just that the data has no
natural keys. The tuples are unique because there's a surrogate key. It
is entirely possible to have two people with the same first name, last
name and date of birth. Rather uncommon, but the database must be able
to support it.

I don't understand your example though. If you have a personnel
directory with two rows with the same first and last name, what does
that tell you. Nothing. You have to go find out whether there really
are two of those people or not. You can simplify the process by taking
into account the fact that it's very unlikely, but a unique constraint
is not the answer. Besides, it's far more likely the same person will
appear twice with two different spellings of their name. :)

Anyway, the discussion was about surrogate vs natural keys. Nothing
here has convinced me that there are any useful natural keys to be
found in the examples I gave. Most of the examples I gave come from a
system I had to maintain where some designer had assumed there was some
kind of natural key and in *each* and *every* case it caused
problems...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Re: Surrogate keys (Was: enums)

From
Leandro Guimarães Faria Corcete Dutra
Date:
Em Qui, 2006-01-19 às 22:29 +0100, Martijn van Oosterhout escreveu:
> Possibly nowhere. But when you send invoices to customers, any details
> on there *are* immutable. Sure, in your database you don't care if
> things change, but then they don't match reality anymore do they?
Then what you need is a temporal database -- at least some form of
historical records.  Nothing to do with keys in themselves.


> I never said there were duplicate tuples, just that the data has no
> natural keys. The tuples are unique because there's a surrogate key.
This does not guarantee uniqueness, as the key is artificially and
internally generated.


> It
> is entirely possible to have two people with the same first name, last
> name and date of birth. Rather uncommon, but the database must be able
> to support it.
And the way to support it is to take into account additional data --
place of birth, parents' data etc -- as part of the candidate keys.  Not
to allow duplicates.


> I don't understand your example though. If you have a personnel
> directory with two rows with the same first and last name, what does
> that tell you. Nothing. You have to go find out whether there really
> are two of those people or not.
And how will you do that if you don't store additional data?


> You can simplify the process by taking
> into account the fact that it's very unlikely, but a unique constraint
> is not the answer.
Oh yes, it is.  They only one.


> Besides, it's far more likely the same person will
> appear twice with two different spellings of their name. :)
So what?

-- 
+55 (11) 5685 2219               xmpp:leandrod@jabber.org
+55 (11) 9406 7191                       Yahoo!: lgcdutra    
+55 (11) 5686 9607         MSN: leandro@dutra.fastmail.fm
+55 (11) 4390 5383                      ICQ/AIM: 61287803