Re: Basic Q on superfluous primary keys - Mailing list pgsql-performance

From Jeff Davis
Subject Re: Basic Q on superfluous primary keys
Date
Msg-id 1176923324.4152.145.camel@dogma.v10.wvs
Whole thread Raw
In response to Re: Basic Q on superfluous primary keys  ("Craig A. James" <cjames@modgraph-usa.com>)
List pgsql-performance
On Tue, 2007-04-17 at 21:06 -0700, Craig A. James wrote:
> Merlin Moncure wrote:
> > In the context of this debate, I see this argument all the time, with
> > the implied suffix: 'If only we used integer keys we would not have
> > had this problem...'.  Either the customer identifies parts with a
> > part number or they don't...and if they do identify parts with a
> > number and recycle the numbers, you have a problem...period.
>
> On the contrary.  You create a new record with the same part number.  You mark the old part number "obsolete".
Everythingelse (the part's description, and all the relationships that it's in, such as order history, catalog
inclusion,revision history, etc.) is unaffected.  New orders are placed against the new part number's DB record; for
safetythe old part number can have a trigger that prevent new orders from being placed. 
>
> Since the part number is NOT the primary key, duplicate part numbers are not a problem.  If you had instead used the
partnumber as the primary key, you'd be dead in the water. 
>
> You can argue that the customer is making a dumb decision by reusing catalog numbers, and I'd agree.  But they do it,
andas database designers we have to handle it.  In my particular system, we aggregate information from several hundred
companies,and this exact scenario happens frequently.  Since we're only aggregating information, we have no control
overthe data that these companies provide.  If we'd used catalog numbers for primary keys, we'd have big problems. 
>

Storing data is easy.

The difficulty lies in storing data in such a way that your assumptions
about the data remain valid and your queries still answer the questions
that you think they answer.

Because an internal ID field has no meaning outside of the database
(some auto-generated keys do have meaning outside the database, but I'm
not talking about those), you can't effectively query based on an
internal id any more than you can query by the ctid. So what do you
query by then? You query by natural keys anyway. Internal id fields are
an implementation detail related to performance (the real topic of this
discussion).

If you have two parts with the same part id, what does that mean? Sure,
you can store the data, but then the queries that assumed that data was
unique no longer hold. Sometimes you need two parts with the same part
id, but you have to know the meaning in order to query based on that
data.

Let me ask these questions:
 - Do you think that all of your relations have an internal id?
 - Do you think that all the internal ids you use are unique in the
relations in which they appear?

If you answer "yes" to either question, consider that every query on
that data is also a relation and so are subselects and intermediate
results. Do those all have an id? If not, why not? How do you join a
virtual relation to a physical relation if the virtual relation has no
internal id? Is the id field still unique in the result of a join or
Cartesian product?

Regards,
    Jeff Davis



pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Long running transactions again ...
Next
From: Hannes Dorbath
Date:
Subject: Re: Equivalents in PostgreSQL of MySQL's "ENGINE=MEMORY" "MAX_ROWS=1000"