Thread: Varchar pkey instead of integer

Varchar pkey instead of integer

From
"Robins Tharakan"
Date:
Hi,

I am currently designing a database and wanted to know something that may sound trivial, but I thought its still good to confirm before dumping millions of rows in it.

The design requires a few master tables with very limited rows, for e.g. currency_denomination table could at the max have a few records like million / billion / crore (used in india) / lacs (india specific) and so on.

Now what I wanted to ask was whether its any different to have the primary-keys in such master tables as text/varchar rather than integer ? i.e. Can I use a character varying(10) and use the text 'million' / 'billion' instead of a serial / integer type ?

p.s.: I am not as much concerned with the size that it'd take on the data tables, as much as the fact that the select / insert performances shouldn't suffer. However, if that increase in size (per data record) may make a considerable impact on the performance, I would certainly want to take that into account during design phase.

Any pointers / replies appreciated.

Regards,
Robins Tharakan

Re: Varchar pkey instead of integer

From
"Joshua D. Drake"
Date:
Robins Tharakan wrote:
> Hi,

> Now what I wanted to ask was whether its any different to have the
> primary-keys in such master tables as text/varchar rather than integer ?
> i.e. Can I use a character varying(10) and use the text 'million' /
> 'billion' instead of a serial / integer type ?

One should ask themselves why before can I. :)

If you want to use a varchar() for a primary key that is fine but make
it a natural key not an arbitrary number. If you are going to use
arbitrary numbers, use a serial or bigserial.

Joshua D. Drake


Re: Varchar pkey instead of integer

From
Craig Ringer
Date:
Robins Tharakan wrote:
> Hi,
>
> I am currently designing a database and wanted to know something that may
> sound trivial, but I thought its still good to confirm before dumping
> millions of rows in it.
>
> The design requires a few master tables with very limited rows, for e.g.
> currency_denomination table could at the max have a few records like million
> / billion / crore (used in india) / lacs (india specific) and so on.
>
> Now what I wanted to ask was whether its any different to have the
> primary-keys in such master tables as text/varchar rather than integer ?

As far as I know it's just slower to compare (ie for fkey checks, index
lookups, etc) and uses more storage. However, if you're only using the
other table to limit possible values in a field rather than storing
other information and you can avoid doing a join / index lookup by
storing the string directly in the master table then that might well be
worth it. It's a tradeoff between the storage cost (seq scan speed,
index size, etc) of using the text values directly vs the savings made
by avoiding having to constantly hit a lookup table.

I have several places in the database I'm presently working on where I
store meaningful integers directly in a "main" table and reference a
single-field table as a foreign key just to limit acceptable values. It
works very well, though it's only suitable in limited situations.

One of the places I'm doing that is for in-database postcode validation.
My current app only needs to validate Australian post codes (as per the
spec) and other post/zip codes are just stored in the address text. I
store the integer representation of the post code directly in address
records but use a foreign key to the single-field "aust_post_code" table
to enforce the use of only valid postcodes. There's an ON DELETE SET
NULL cascade on the fkey because for this app's purpose a postcode
that's no longer accepted by the postal service is bad data.

This means that the postcode list can't be updated by a TRUNCATE and
repopulate. No big deal; I prefer to do a compare between the current
database contents and the latest postcode data and insert/delete as
appropriate anyway; especially as the app needs to be able to record and
flag tentative entries for postcodes that the user *insists* exist but
the latest (possibly even weeks old) australia post data says do not.

You could reasonably do the same sort of thing with a text postcode if
your app had to care about non-numeric postal codes.

It's nice being able to work on something that doesn't have to handle
pedal-post in some awful corner of the earth where they identify postal
regions by coloured tags. OK, not really, but sometimes addressing seems
almost that bad.

> i.e. Can I use a character varying(10) and use the text 'million' /
> 'billion' instead of a serial / integer type ?

If you're looking at a small set of possible values an enumeration
*might* be an option. Be aware that they're painful and slow to change
later, though, especially when used in foreign keys, views, etc.

I certainly wouldn't use one for your currency denomination table, which
is likely to see values added to it over time.

> p.s.: I am not as much concerned with the size that it'd take on the data
> tables, as much as the fact that the select / insert performances shouldn't
> suffer. However, if that increase in size (per data record) may make a
> considerable impact on the performance, I would certainly want to take that
> into account during design phase.

I suspect it's just another tradeoff - table size increase (and thus
scan performance cost) from storing the text vs avoiding the need to
access the lookup table for most operations.

--
Craig Ringer

Re: Varchar pkey instead of integer

From
Shane Ambler
Date:
Craig Ringer wrote:

>> p.s.: I am not as much concerned with the size that it'd take on the data
>> tables, as much as the fact that the select / insert performances
>> shouldn't
>> suffer. However, if that increase in size (per data record) may make a
>> considerable impact on the performance, I would certainly want to take
>> that
>> into account during design phase.
>
> I suspect it's just another tradeoff - table size increase (and thus
> scan performance cost) from storing the text vs avoiding the need to
> access the lookup table for most operations.
>

Size can affect performance as much as anything else. In your case of
limited rows it will make little difference, though the larger table
with millions of rows will have this key entered for each row and be
indexed as the foreign key.

The real question is how you want to use the column, if you wish to
query for rows of a certain currency then you will notice the difference.

You could use a smallint of 2 bytes each (or a varchar(1) with an int
value instead of a real char) or an integer of 4 bytes, compared to your
varchar(10)

Data size on the column could be less than half of the size of the
varchar(10) so there will be less disk reads (the biggest slow down) and
smaller indexes which can increase chances of caching.

Without storage overheads each million rows will have 10*1000000=10M
bytes of data compared to 4*1000000=4M bytes - you can see that the
chance of caching and the time to read off disk will come into effect
each time you reference that column.



--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

Re: Varchar pkey instead of integer

From
Craig Ringer
Date:
Shane Ambler wrote:

> Size can affect performance as much as anything else. In your case of
> limited rows it will make little difference, though the larger table
> with millions of rows will have this key entered for each row and be
> indexed as the foreign key.
>
> The real question is how you want to use the column, if you wish to
> query for rows of a certain currency then you will notice the difference.
>
> You could use a smallint of 2 bytes each (or a varchar(1) with an int
> value instead of a real char) or an integer of 4 bytes, compared to your
> varchar(10)

... and if there are only a few records in the currency column, it
rarely changes, and you put a trigger in place to prevent the re-use of
previously assigned keys you may be able to cache that data in your
application.

That way you avoid a join or subquery on your lookup table to get the
text description of the currency AND get the storage/performance of a
small integer key.

It's something I'm doing in other places in my current DB where I have
essentially static lookup tables. You do have to watch out for lookup
table changes, though.

It's worth noting that my database is rather puny (the largest table has
500,000 records) and I'm very, very far from an expert on any of this,
so there might be some hidden downside to doing things this way that I
just haven't hit yet.

--
Craig Ringer

Re: Varchar pkey instead of integer

From
"J. Andrew Rogers"
Date:
On May 21, 2008, at 12:33 AM, Shane Ambler wrote:
>
> Size can affect performance as much as anything else.

For a brief moment, I thought the mailing list had been spammed. ;-)

J. Andrew Rogers


Re: Varchar pkey instead of integer

From
"Robins Tharakan"
Date:
On Wed, May 21, 2008 at 1:27 PM, J. Andrew Rogers <jrogers@neopolitan.com> wrote:

On May 21, 2008, at 12:33 AM, Shane Ambler wrote:

Size can affect performance as much as anything else.

For a brief moment, I thought the mailing list had been spammed. ;-)

And that sums up why I wish to thank everyone for the responses.. :)

Robins