Thread: Question on simulating Enum Data type

Question on simulating Enum Data type

From
"Morgan Kita"
Date:
Ok this is noobish, but I am new to both databases and especially PostgreSQL.

I am planning out how I am going to setup my database schema, and I have quite a few fields in the different tables
whereI would like to set up an enum type similar to C and C++ style enum types. Essentially these fields will have like
5-15choices, and the tables themselves might have on the order of 10 million rows. Now I know I can pretty easily
simulatethis with domain constraints.  

However, I am a little concerned about performance in that case. If I use domain constraints and keep the choices as
stringsthen a string comparison will be done whenver I query on this field right? I know an index will speed this up
quitea bit but even so I may have to do 10s of thousands of string compares if there are only 5 choices right?  

Ideally wouldn't it be better to store an integer field in the tables, and then keep a seperate small map table? Then
theapplication could use the map table to look up the key and then do a query on the large table using only integer
compares? 

Am I just being silly or am I not understanding something here? Maybe there is another way to do this?

Thanks,
Morgan

Re: Question on simulating Enum Data type

From
Michael Glaesemann
Date:
On Mar 18, 2005, at 11:18, Morgan Kita wrote:

> However, I am a little concerned about performance in that case. If I
> use domain constraints and keep the choices as strings then a string
> comparison will be done whenver I query on this field right? I know an
> index will speed this up quite a bit but even so I may have to do 10s
> of thousands of string compares if there are only 5 choices right?
>
> Ideally wouldn't it be better to store an integer field in the tables,
> and then keep a seperate small map table? Then the application could
> use the map table to look up the key and then do a query on the large
> table using only integer compares?
>
> Am I just being silly or am I not understanding something here? Maybe
> there is another way to do this?

What you've described are the two common ways to approach this
situation. To accurately know which will be performant in your
situation, I think you'll need to run benchmarks on your system. I
don't know whether any such comparison has been done as a reference.
(Yours could be a good one :) Perhaps someone else out there might have
experience with this or knowledge of the backend and chime in.

Michael Glaesemann
grzm myrealbox com


Re: Question on simulating Enum Data type

From
Tom Lane
Date:
"Morgan Kita" <mkita@verseon.com> writes:
> However, I am a little concerned about performance in that case. If I
> use domain constraints and keep the choices as strings then a string
> comparison will be done whenver I query on this field right?

Hm?  A domain constraint would cost cycles when storing into the table,
but not when querying it.  Joining to another table, on the other hand,
will cost you during queries.

If the strings are long, so that the amount of extra space involved adds
up to a lot, it'd probably make sense to go with integer codes.  But if
they're just a word or so then I'd lean to keeping it simple.

            regards, tom lane

Re: Question on simulating Enum Data type

From
"Morgan Kita"
Date:
Hmm still I wonder, won't the varchar/char compares be much slower than using a seperate map table, grabbing the int
value,and then only doing int compares? Maybe somebody can enlighten me on the relative speed of queries involving
stringcompares vs queries on int compares. 

-----Original Message-----
From: pgsql-novice-owner@postgresql.org
[mailto:pgsql-novice-owner@postgresql.org]On Behalf Of Tom Lane
Sent: Thursday, March 17, 2005 6:58 PM
To: Morgan Kita
Cc: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Question on simulating Enum Data type


"Morgan Kita" <mkita@verseon.com> writes:
> However, I am a little concerned about performance in that case. If I
> use domain constraints and keep the choices as strings then a string
> comparison will be done whenver I query on this field right?

Hm?  A domain constraint would cost cycles when storing into the table,
but not when querying it.  Joining to another table, on the other hand,
will cost you during queries.

If the strings are long, so that the amount of extra space involved adds
up to a lot, it'd probably make sense to go with integer codes.  But if
they're just a word or so then I'd lean to keeping it simple.


Re: Question on simulating Enum Data type

From
Tom Lane
Date:
"Morgan Kita" <mkita@verseon.com> writes:
> Hmm still I wonder, won't the varchar/char compares be much slower
> than using a seperate map table, grabbing the int value, and then only
> doing int compares?

Certainly a string compare is slower than an integer compare, but you
have to consider the context.  In database work what usually counts more
than any CPU effort is the amount of disk I/O --- and that means that
adding a table join to avoid a string compare is being penny-wise
and pound-foolish.

It'd be well worth your while to set up a few experimental comparisons
to see how this plays out in practice.  Either answer *could* be right
depending on your situation.

            regards, tom lane