Re: Using varchar primary keys. - Mailing list pgsql-general
From | Julian |
---|---|
Subject | Re: Using varchar primary keys. |
Date | |
Msg-id | 5158FA68.5040604@internode.on.net Whole thread Raw |
In response to | Re: Using varchar primary keys. (Modulok <modulok@gmail.com>) |
Responses |
Re: Using varchar primary keys.
|
List | pgsql-general |
On 01/04/13 12:19, Modulok wrote: > On 3/31/13, Tim Uckun <timuckun@gmail.com> wrote: >> Consider the following scenario. >> >> I have a typical tagging structure. There is a table called tags, there is >> a table called taggings. The taggings table acts as a many to many join >> table between the taggers and the tags. >> >> The tags table has two fields id and tag. id is a serial data type. >> >> The taggings has a tag_id field along with the context, tagger_id etc. >> >> I don't think there is even one query in this scenario which does not join >> the taggings table to the tags table so I am wondering if there is any >> advantage at all of having that id field in the tags table. Why shouldn't I >> just put the tag itself as the primary key? The tags are quite short so if >> pg is smart about it I would guess they would take up less space than an >> integer in some cases. >> >> I guess the same scenario happens in many of my lookup tables. I have all >> these tables with just two fields in them. id field and then some varchar >> field. Then every time I want to display that record I join with five >> other tables so I can look up all the associated lookup tables and display >> the text value to the user. >> >> Given that you can still set foreign keys and cascade commands to adjust >> child records either way the tables would be properly normalized so I am >> wondering what I am gaining by using these serial ID fields. > > You can. Arguably, if the natural key is relatively short and consists of only > one column E.F. Codd would probably have encourage you to use it. I'd leave the > serial column for convenience of identifying individual records on the command > line, especially if the varchar can store characters that are not easily typed > in a console. > > As for the primary key being a 'natural key' vs. a 'synthetic' one, > that's a huge debate akin to emacs vs. vi. Depending on your situation, there's > nothing wrong with using natural keys. It is discussed at length and > in some cases encouraged here:: > > http://database-programmer.blogspot.com/2008/01/database-skills-sane-approach-to.html > > > -Modulok- > > I disagree with almost everything in that blog post: Shouldn't the first question be to figure out why even a bigserial (int8) would not suffice as a unique identifier for row data? There are reasons where it wont, what are they? (theres a great blog post by a twitter developer on the development of their unique identifier (PK) - couldn't find it) Natural Keys have a purpose but when do they exist in the database? For example a Surrogate Key that is exposed to the world may get wrapped in a checkbit algorithm or perhaps even a hmac encoded key to validate the genuineness of incoming requests before an expensive database lookup is initiated. So is there a need to store the Natural Keys generated in the application? It might be useful to external parties though they may even call it "foos_pks". What about the Natural Keys of an external source? Should that be stored in a lookup table along with the integer based Surrogate Key? Maybe I'll call it "bars_pks". As to the original question, my response may seem off topic, but exposing PK's to the world has some pitfalls (and not all to do with security) and character based keys have pitfalls as compared to Surrogate Keys that stay out of the displayed data. CREATE TABLE tags (tag text PRIMARY KEY); CREATE TABLE child_tag (parent_tag text REFERENCES tags); INSERT INTO tags VALUES ('fu'); INSERT INTO child_tag VALUES ('fu'); --Spelling error. UPDATE tags SET tag = 'foo' WHERE tag = 'fu'; This will fail unless you ON UPDATE CASCADE. Some things to think about. Regards, Jules.
pgsql-general by date: