Thread: int id's helpful for indexing, or just use text names?

int id's helpful for indexing, or just use text names?

From

george young

Date:

15 October 2002, 15:16:59

[linux, postgresql 7.2, 500MHz * 4 xeon cpu's, 1GB ram, hardware raid]
My current db has serveral instances of something like: table foos(fooid int2, fooname text, foouser text, foobar int2
referencesbars(barid))

table bars(barid int2, barname text, barcolor text, primary key(barid) )

etc, where foonames and barnames are known to be, say <20 characters long.
And the fooid's and barid's are arbitrary ints only known inside the db.

The original reason for these numeric id's, (in another db system long long ago),
was to conserve space(now irrelvant with 120G disks) and to make searching and
index usage more efficient. Recently, there is increasing call for new apps and
even ad-hoc queries.(Thank goodness people are finally interested in this data!)
The artificial numeric id's make it a lot harder for naive users to understand
the data structure, and sometimes actually requires an extra order of joins.

The question is: would I be better off losing all those integer ids and just using
the text names as primary indices? Is there much performance lost comparing
text strings for every index operation?

My db is not huge: longest table has 100k tuples, biggest table has 1k pages, total
pg_dump output is 51 Mbytes. Typical activity: ~6000 updates and inserts/day,
30,000 selects/day.

-- I cannot think why the whole bed of the ocean isnot one solid mass of oysters, so prolific they seem. Ah,I am
wandering!Strange how the brain controls the brain!-- Sherlock Holmes in "The Dying Detective"

Re: int id's helpful for indexing, or just use text names?

From

Tom Lane

Date:

15 October 2002, 17:07:00

george young <gry@ll.mit.edu> writes:
> The question is: would I be better off losing all those integer ids
> and just using the text names as primary indices?  Is there much
> performance lost comparing text strings for every index operation?

Strings would be slower, but possibly not by enough to notice.  The
real question you should ask yourself is whether your text names are
really good candidates to be primary keys.  Are there ever any
duplicates?  Do you ever rename objects?  If your answer to either
is "yes" then the names won't do as unique identifiers.

There's an old saying that meaningful keys are bad database design;
check the mailing list archives for some examples.  (F'r instance,
I seem to recall a story about a bank that embedded branch numbers
into account numbers, and then had terrible troubles anytime a
customer moved...)
        regards, tom lane