Re: Varchar pkey instead of integer - Mailing list pgsql-performance
From | Craig Ringer |
---|---|
Subject | Re: Varchar pkey instead of integer |
Date | |
Msg-id | 4833C285.5000603@postnewspapers.com.au Whole thread Raw |
In response to | Varchar pkey instead of integer ("Robins Tharakan" <tharakan@gmail.com>) |
Responses |
Re: Varchar pkey instead of integer
|
List | pgsql-performance |
Robins Tharakan wrote: > Hi, > > I am currently designing a database and wanted to know something that may > sound trivial, but I thought its still good to confirm before dumping > millions of rows in it. > > The design requires a few master tables with very limited rows, for e.g. > currency_denomination table could at the max have a few records like million > / billion / crore (used in india) / lacs (india specific) and so on. > > Now what I wanted to ask was whether its any different to have the > primary-keys in such master tables as text/varchar rather than integer ? As far as I know it's just slower to compare (ie for fkey checks, index lookups, etc) and uses more storage. However, if you're only using the other table to limit possible values in a field rather than storing other information and you can avoid doing a join / index lookup by storing the string directly in the master table then that might well be worth it. It's a tradeoff between the storage cost (seq scan speed, index size, etc) of using the text values directly vs the savings made by avoiding having to constantly hit a lookup table. I have several places in the database I'm presently working on where I store meaningful integers directly in a "main" table and reference a single-field table as a foreign key just to limit acceptable values. It works very well, though it's only suitable in limited situations. One of the places I'm doing that is for in-database postcode validation. My current app only needs to validate Australian post codes (as per the spec) and other post/zip codes are just stored in the address text. I store the integer representation of the post code directly in address records but use a foreign key to the single-field "aust_post_code" table to enforce the use of only valid postcodes. There's an ON DELETE SET NULL cascade on the fkey because for this app's purpose a postcode that's no longer accepted by the postal service is bad data. This means that the postcode list can't be updated by a TRUNCATE and repopulate. No big deal; I prefer to do a compare between the current database contents and the latest postcode data and insert/delete as appropriate anyway; especially as the app needs to be able to record and flag tentative entries for postcodes that the user *insists* exist but the latest (possibly even weeks old) australia post data says do not. You could reasonably do the same sort of thing with a text postcode if your app had to care about non-numeric postal codes. It's nice being able to work on something that doesn't have to handle pedal-post in some awful corner of the earth where they identify postal regions by coloured tags. OK, not really, but sometimes addressing seems almost that bad. > i.e. Can I use a character varying(10) and use the text 'million' / > 'billion' instead of a serial / integer type ? If you're looking at a small set of possible values an enumeration *might* be an option. Be aware that they're painful and slow to change later, though, especially when used in foreign keys, views, etc. I certainly wouldn't use one for your currency denomination table, which is likely to see values added to it over time. > p.s.: I am not as much concerned with the size that it'd take on the data > tables, as much as the fact that the select / insert performances shouldn't > suffer. However, if that increase in size (per data record) may make a > considerable impact on the performance, I would certainly want to take that > into account during design phase. I suspect it's just another tradeoff - table size increase (and thus scan performance cost) from storing the text vs avoiding the need to access the lookup table for most operations. -- Craig Ringer
pgsql-performance by date: