I know this is an old topic and also a religious one so I won't get
into the debate, but I thought up one possible solution that would make
almost everybody happy and was wondering if any PostgreSQL hackers out
there had any thoughts.
I was wondering if, considering that an entity can only have a single
primary key, a combination of both methods could be possible. For
example, using a surrogate key exclusively for maintaining entity
relationships while at the same time using a natural candidate key as a
user-friendly method of identifying records. The reason I ask is that
it seems like this is the sort of thing that even a fully SQL-compliant
DBMS could do internally to compensate for the performance issues with
using large natural keys in relationships.
I know, for example, that by default PostgreSQL assigns every record a
small unique identifier called an OID. It seems reasonable then, that
when the DBA creates a cascading foreign key to a record, that the DBMS
could, instead of storing the record's entire natural key, store only a
reference to the OID and abstract/hide this behavior from the
environment just as PostgreSQL does with its OID feature now. Of
course, this would require that the OID be guaranteed unique, which I
don't beleave is the case in the current versions.
This would completely eliminate concerns related to the performance of
cascading updates because no actual cascade would take place, but
rather the update would affect all referencing records implicitly via
the abstraction. Additionally, storage concerns related to large
foreign keys would similarly be addressed since only the OID reference
would be stored. It would then, as far as I can tell, be completely
reasonable to use a person's mailing address as well as other large and
numerous fields as primary keys so long as the resulting index
performance would be acceptable.
As far as index performance is concerned, for many tables "unique"
indexes already exist on these large natural candidate keys, and I
don't think any index performance would be lost by turning those unique
indexes into primary key indexes.
Is such a "behind the scenes" abstraction of large foreign keys
practical? If so, I think such a mechanism would be a very powerful
feature for physical data modelers working with PostgreSQL.