Re: Basic Q on superfluous primary keys - Mailing list pgsql-performance

From Bill Moran
Subject Re: Basic Q on superfluous primary keys
Date
Msg-id 20070414075947.e5fbeec8.wmoran@collaborativefusion.com
Whole thread Raw
In response to Basic Q on superfluous primary keys  ("Kynn Jones" <kynnjo@gmail.com>)
Responses Re: Basic Q on superfluous primary keys
List pgsql-performance
In response to "Kynn Jones" <kynnjo@gmail.com>:

> Consider these two very similar schemas:
>
> Schema 1:
>
>
> CREATE TABLE foo (
>   id serial PRIMARY KEY,
>   frobnitz character(varying 100) NOT NULL UNIQUE
> );
>
>
> CREATE TABLE bar (
>   id serial PRIMARY KEY,
>   foo_id int REFERENCES foo(id)
> )
>
>
> Schema 2:
>
>
> CREATE TABLE foo (
>   frobnitz character(varying 100) PRIMARY KEY
> );
>
>
> CREATE TABLE bar (
>   id serial PRIMARY KEY,
>   frobnitz character(varying 100) REFERENCES foo(frobnitz)
> )
>
>
>
>
> The two situations are semantically identical: each record in table bar
> refers to a record in table foo.  The difference is that in the first
> schema, this referencing is done through an "artificial" serial-integer
> primary key, while in the second schema this reference is done through a
> data field that happens to be unique and not null, so it can serve as
> primary key.

The first case is call a "surrogate key".  A little googling on that term
will turn up a wealth of discussion -- both for and against.

> I find Schema 1 awkward and unnatural; more specifically, foo.id seems
> unnecessary in light of the non-null uniqueness of foo.frobnitz.  But I
> remember once reading that "long" fields like foo.frobnitz did not make good
> primary keys.

I had a discussion about this recently on the Drupal mailing lists, at the
end of which I promised to do some benchmarking to determine whether or
not text keys really do hurt performance of indexes.  Unfortunately, I
still haven't followed through on that promise -- maybe I'll get to it
tomorrow.

> Is the field foo.id in Schema 1 superfluous?  For example, wouldn't the
> referencing from bar to foo really be done "behind the scenes" through some
> hidden field (oid?) instead of through the frobnitz text field?  Which of
> the two schemas would give better perfornance?

--
Bill Moran
Collaborative Fusion Inc.

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

pgsql-performance by date:

Previous
From: "Kynn Jones"
Date:
Subject: Basic Q on superfluous primary keys
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] choose_bitmap_and again (was Re: Strangely Variable Query Performance)