Re: varchar as primary key - Mailing list pgsql-general

From Merlin Moncure
Subject Re: varchar as primary key
Date
Msg-id b42b73150705041023n5a583b47ra5a6064fb128e02f@mail.gmail.com
Whole thread Raw
In response to Re: varchar as primary key  ("Alexander Staubo" <alex@purefiction.net>)
List pgsql-general
On 5/3/07, Alexander Staubo <alex@purefiction.net> wrote:
> PostgreSQL uses B-trees for its indexes, insertion time is logarithmic
> regardless of the type of the key, but strings have a larger overhead
> since they involve character comparisons; (i - j) is a lot faster than
> strcmp(i, j). If you do go for strings, I would suggest that the
> beginning of the key be statistically distributed as widely as
> possible; ie., avoid common prefixes.

I think the performance benefits of i - j over strcmp(i,j) are mostly
irrelevant, locale issues aside.  The main reason why integer keys can
be faster is because the index is smaller and puts less pressure on
cache.  This has to stacked up against the fact you are often hitting
the varchar index anyways for sorting and filtering purposes (swapping
a int for text index is only a guaranteed win if you can drop the text
index completely).  So, by using integers from performance perspective
we are mostly trying to prevent a cache miss (during which time a
computer might perform 100k strcmp operations).  If there is also a
varchar index, and it is used for various queries, it may actually be
faster to drop the integer index altogether because it is competing
with cache resources with the integer index.  Unfortunately, this is
more often the case than not in my experience.

As solid state technologies continue to mature and near zero latency
storage systems become widespread, this advantage will lessen as the
penalty for a cache miss becomes much less.

merlin

pgsql-general by date:

Previous
From: "Mike Frysinger"
Date:
Subject: Re: cant get pg_dump/pg_restore to behave
Next
From: Jeff Davis
Date:
Subject: Re: varchar as primary key