Re: Varchar vs foreign key vs enumerator - table and index size - Mailing list pgsql-performance

From Tom Lane
Subject Re: Varchar vs foreign key vs enumerator - table and index size
Date
Msg-id 30071.1377962447@sss.pgh.pa.us
Whole thread Raw
In response to Varchar vs foreign key vs enumerator - table and index size  (Łukasz Walkowski <lukasz.walkowski@homplex.pl>)
Responses Re: Varchar vs foreign key vs enumerator - table and index size
List pgsql-performance
=?utf-8?Q?=C5=81ukasz_Walkowski?= <lukasz.walkowski@homplex.pl> writes:
> 3. And this part is most interesting for me. Columns browser, eventsource, eventtype, devicetype, operatingsystem
containa small pool of strings - for example for devicetype this is set to Computer, Mobile, Tablet or Unknown. Browser
isset to normalized browser name. In every case I can store those data using one of 3 different methods:  

> - store as varchar as it is now - nice and easy, but index on those columns is quite big and I think storing many of
similarstrings is waste of space.  

If you're starting to be concerned about space, it's definitely time to
get away from this choice.  Depending on what locale you're using,
comparing varchar values can be quite an expensive operation, too.

> - store only id's and join external tables as needed, for example for browsers I only need smallint key, as there is
alimited number of browsers. 

I think the main "pro" of this approach is that it doesn't use any
nonstandard SQL features, so you preserve your options to move to some
other database in the future.  The main "con" is that you'd be buying into
fairly significant rewriting of your application code, since just about
every query involving these columns would have to become a join.

FWIW, I'd be inclined to just use integer not smallint.  The space savings
from smallint is frequently illusory because of alignment considerations
--- for instance, an index on a single smallint column will *not* be any
smaller than one on a single int column.  And smallint has some minor
usage annoyances because it's a second-class citizen in the type promotion
hierarchy --- you may find yourself needing explicit casts to smallint
here and there.

> - introduce enumerator type for each of the column and store those values as enumerator. This one should be the most
spaceefficient, but it will be problematic in case of changing column values like browser or operatingsystem as
alteringenumerator isn't that simple.  

Space-wise this is going to be equivalent to the integer-foreign-key
solution.  It's much nicer from a notational standpoint, though, because
you don't need joins --- it's likely that you'd need few if any
application code changes to go this route.  (But I'd advise doing some
testing to verify that before you take it as a given.)

You're right though that enums are not a good option if you expect
frequent changes in the pool of allowed values.  I guess the question
is how often does that happen, in your application?  Adding a new value
from time to time isn't much of a problem unless you want to get picky
about how it sorts relative to existing values.  But you can't ever delete
an individual enum value, and we don't support renaming them either.
(Though if you're desperate, I believe a manual UPDATE on the pg_enum
catalog would work for that.)

Another thing to think about is whether you have auxiliary data about each
value that might usefully be stored as additional columns in the small
tables.  The enum approach doesn't directly handle that, though I suppose
you could still create small separate tables that use an enum column as
primary key.

            regards, tom lane


pgsql-performance by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: How clustering for scale out works in PostgreSQL
Next
From: Łukasz Walkowski
Date:
Subject: Re: Varchar vs foreign key vs enumerator - table and index size