Re: Varchar vs foreign key vs enumerator - table and index size - Mailing list pgsql-performance
From | Tom Lane |
---|---|
Subject | Re: Varchar vs foreign key vs enumerator - table and index size |
Date | |
Msg-id | 30071.1377962447@sss.pgh.pa.us Whole thread Raw |
In response to | Varchar vs foreign key vs enumerator - table and index size (Łukasz Walkowski <lukasz.walkowski@homplex.pl>) |
Responses |
Re: Varchar vs foreign key vs enumerator - table and index size
|
List | pgsql-performance |
=?utf-8?Q?=C5=81ukasz_Walkowski?= <lukasz.walkowski@homplex.pl> writes: > 3. And this part is most interesting for me. Columns browser, eventsource, eventtype, devicetype, operatingsystem containa small pool of strings - for example for devicetype this is set to Computer, Mobile, Tablet or Unknown. Browser isset to normalized browser name. In every case I can store those data using one of 3 different methods: > - store as varchar as it is now - nice and easy, but index on those columns is quite big and I think storing many of similarstrings is waste of space. If you're starting to be concerned about space, it's definitely time to get away from this choice. Depending on what locale you're using, comparing varchar values can be quite an expensive operation, too. > - store only id's and join external tables as needed, for example for browsers I only need smallint key, as there is alimited number of browsers. I think the main "pro" of this approach is that it doesn't use any nonstandard SQL features, so you preserve your options to move to some other database in the future. The main "con" is that you'd be buying into fairly significant rewriting of your application code, since just about every query involving these columns would have to become a join. FWIW, I'd be inclined to just use integer not smallint. The space savings from smallint is frequently illusory because of alignment considerations --- for instance, an index on a single smallint column will *not* be any smaller than one on a single int column. And smallint has some minor usage annoyances because it's a second-class citizen in the type promotion hierarchy --- you may find yourself needing explicit casts to smallint here and there. > - introduce enumerator type for each of the column and store those values as enumerator. This one should be the most spaceefficient, but it will be problematic in case of changing column values like browser or operatingsystem as alteringenumerator isn't that simple. Space-wise this is going to be equivalent to the integer-foreign-key solution. It's much nicer from a notational standpoint, though, because you don't need joins --- it's likely that you'd need few if any application code changes to go this route. (But I'd advise doing some testing to verify that before you take it as a given.) You're right though that enums are not a good option if you expect frequent changes in the pool of allowed values. I guess the question is how often does that happen, in your application? Adding a new value from time to time isn't much of a problem unless you want to get picky about how it sorts relative to existing values. But you can't ever delete an individual enum value, and we don't support renaming them either. (Though if you're desperate, I believe a manual UPDATE on the pg_enum catalog would work for that.) Another thing to think about is whether you have auxiliary data about each value that might usefully be stored as additional columns in the small tables. The enum approach doesn't directly handle that, though I suppose you could still create small separate tables that use an enum column as primary key. regards, tom lane
pgsql-performance by date: