Thread: Varchar pkey instead of integer
Hi,
I am currently designing a database and wanted to know something that may sound trivial, but I thought its still good to confirm before dumping millions of rows in it.
The design requires a few master tables with very limited rows, for e.g. currency_denomination table could at the max have a few records like million / billion / crore (used in india) / lacs (india specific) and so on.
Now what I wanted to ask was whether its any different to have the primary-keys in such master tables as text/varchar rather than integer ? i.e. Can I use a character varying(10) and use the text 'million' / 'billion' instead of a serial / integer type ?
p.s.: I am not as much concerned with the size that it'd take on the data tables, as much as the fact that the select / insert performances shouldn't suffer. However, if that increase in size (per data record) may make a considerable impact on the performance, I would certainly want to take that into account during design phase.
Any pointers / replies appreciated.
Regards,
Robins Tharakan
I am currently designing a database and wanted to know something that may sound trivial, but I thought its still good to confirm before dumping millions of rows in it.
The design requires a few master tables with very limited rows, for e.g. currency_denomination table could at the max have a few records like million / billion / crore (used in india) / lacs (india specific) and so on.
Now what I wanted to ask was whether its any different to have the primary-keys in such master tables as text/varchar rather than integer ? i.e. Can I use a character varying(10) and use the text 'million' / 'billion' instead of a serial / integer type ?
p.s.: I am not as much concerned with the size that it'd take on the data tables, as much as the fact that the select / insert performances shouldn't suffer. However, if that increase in size (per data record) may make a considerable impact on the performance, I would certainly want to take that into account during design phase.
Any pointers / replies appreciated.
Regards,
Robins Tharakan
Robins Tharakan wrote: > Hi, > Now what I wanted to ask was whether its any different to have the > primary-keys in such master tables as text/varchar rather than integer ? > i.e. Can I use a character varying(10) and use the text 'million' / > 'billion' instead of a serial / integer type ? One should ask themselves why before can I. :) If you want to use a varchar() for a primary key that is fine but make it a natural key not an arbitrary number. If you are going to use arbitrary numbers, use a serial or bigserial. Joshua D. Drake
Robins Tharakan wrote: > Hi, > > I am currently designing a database and wanted to know something that may > sound trivial, but I thought its still good to confirm before dumping > millions of rows in it. > > The design requires a few master tables with very limited rows, for e.g. > currency_denomination table could at the max have a few records like million > / billion / crore (used in india) / lacs (india specific) and so on. > > Now what I wanted to ask was whether its any different to have the > primary-keys in such master tables as text/varchar rather than integer ? As far as I know it's just slower to compare (ie for fkey checks, index lookups, etc) and uses more storage. However, if you're only using the other table to limit possible values in a field rather than storing other information and you can avoid doing a join / index lookup by storing the string directly in the master table then that might well be worth it. It's a tradeoff between the storage cost (seq scan speed, index size, etc) of using the text values directly vs the savings made by avoiding having to constantly hit a lookup table. I have several places in the database I'm presently working on where I store meaningful integers directly in a "main" table and reference a single-field table as a foreign key just to limit acceptable values. It works very well, though it's only suitable in limited situations. One of the places I'm doing that is for in-database postcode validation. My current app only needs to validate Australian post codes (as per the spec) and other post/zip codes are just stored in the address text. I store the integer representation of the post code directly in address records but use a foreign key to the single-field "aust_post_code" table to enforce the use of only valid postcodes. There's an ON DELETE SET NULL cascade on the fkey because for this app's purpose a postcode that's no longer accepted by the postal service is bad data. This means that the postcode list can't be updated by a TRUNCATE and repopulate. No big deal; I prefer to do a compare between the current database contents and the latest postcode data and insert/delete as appropriate anyway; especially as the app needs to be able to record and flag tentative entries for postcodes that the user *insists* exist but the latest (possibly even weeks old) australia post data says do not. You could reasonably do the same sort of thing with a text postcode if your app had to care about non-numeric postal codes. It's nice being able to work on something that doesn't have to handle pedal-post in some awful corner of the earth where they identify postal regions by coloured tags. OK, not really, but sometimes addressing seems almost that bad. > i.e. Can I use a character varying(10) and use the text 'million' / > 'billion' instead of a serial / integer type ? If you're looking at a small set of possible values an enumeration *might* be an option. Be aware that they're painful and slow to change later, though, especially when used in foreign keys, views, etc. I certainly wouldn't use one for your currency denomination table, which is likely to see values added to it over time. > p.s.: I am not as much concerned with the size that it'd take on the data > tables, as much as the fact that the select / insert performances shouldn't > suffer. However, if that increase in size (per data record) may make a > considerable impact on the performance, I would certainly want to take that > into account during design phase. I suspect it's just another tradeoff - table size increase (and thus scan performance cost) from storing the text vs avoiding the need to access the lookup table for most operations. -- Craig Ringer
Craig Ringer wrote: >> p.s.: I am not as much concerned with the size that it'd take on the data >> tables, as much as the fact that the select / insert performances >> shouldn't >> suffer. However, if that increase in size (per data record) may make a >> considerable impact on the performance, I would certainly want to take >> that >> into account during design phase. > > I suspect it's just another tradeoff - table size increase (and thus > scan performance cost) from storing the text vs avoiding the need to > access the lookup table for most operations. > Size can affect performance as much as anything else. In your case of limited rows it will make little difference, though the larger table with millions of rows will have this key entered for each row and be indexed as the foreign key. The real question is how you want to use the column, if you wish to query for rows of a certain currency then you will notice the difference. You could use a smallint of 2 bytes each (or a varchar(1) with an int value instead of a real char) or an integer of 4 bytes, compared to your varchar(10) Data size on the column could be less than half of the size of the varchar(10) so there will be less disk reads (the biggest slow down) and smaller indexes which can increase chances of caching. Without storage overheads each million rows will have 10*1000000=10M bytes of data compared to 4*1000000=4M bytes - you can see that the chance of caching and the time to read off disk will come into effect each time you reference that column. -- Shane Ambler pgSQL (at) Sheeky (dot) Biz Get Sheeky @ http://Sheeky.Biz
Shane Ambler wrote: > Size can affect performance as much as anything else. In your case of > limited rows it will make little difference, though the larger table > with millions of rows will have this key entered for each row and be > indexed as the foreign key. > > The real question is how you want to use the column, if you wish to > query for rows of a certain currency then you will notice the difference. > > You could use a smallint of 2 bytes each (or a varchar(1) with an int > value instead of a real char) or an integer of 4 bytes, compared to your > varchar(10) ... and if there are only a few records in the currency column, it rarely changes, and you put a trigger in place to prevent the re-use of previously assigned keys you may be able to cache that data in your application. That way you avoid a join or subquery on your lookup table to get the text description of the currency AND get the storage/performance of a small integer key. It's something I'm doing in other places in my current DB where I have essentially static lookup tables. You do have to watch out for lookup table changes, though. It's worth noting that my database is rather puny (the largest table has 500,000 records) and I'm very, very far from an expert on any of this, so there might be some hidden downside to doing things this way that I just haven't hit yet. -- Craig Ringer
On May 21, 2008, at 12:33 AM, Shane Ambler wrote: > > Size can affect performance as much as anything else. For a brief moment, I thought the mailing list had been spammed. ;-) J. Andrew Rogers
On Wed, May 21, 2008 at 1:27 PM, J. Andrew Rogers <jrogers@neopolitan.com> wrote:
And that sums up why I wish to thank everyone for the responses.. :)
Robins
For a brief moment, I thought the mailing list had been spammed. ;-)
On May 21, 2008, at 12:33 AM, Shane Ambler wrote:
Size can affect performance as much as anything else.
And that sums up why I wish to thank everyone for the responses.. :)
Robins