Thread: Question on simulating Enum Data type
Ok this is noobish, but I am new to both databases and especially PostgreSQL. I am planning out how I am going to setup my database schema, and I have quite a few fields in the different tables whereI would like to set up an enum type similar to C and C++ style enum types. Essentially these fields will have like 5-15choices, and the tables themselves might have on the order of 10 million rows. Now I know I can pretty easily simulatethis with domain constraints. However, I am a little concerned about performance in that case. If I use domain constraints and keep the choices as stringsthen a string comparison will be done whenver I query on this field right? I know an index will speed this up quitea bit but even so I may have to do 10s of thousands of string compares if there are only 5 choices right? Ideally wouldn't it be better to store an integer field in the tables, and then keep a seperate small map table? Then theapplication could use the map table to look up the key and then do a query on the large table using only integer compares? Am I just being silly or am I not understanding something here? Maybe there is another way to do this? Thanks, Morgan
On Mar 18, 2005, at 11:18, Morgan Kita wrote: > However, I am a little concerned about performance in that case. If I > use domain constraints and keep the choices as strings then a string > comparison will be done whenver I query on this field right? I know an > index will speed this up quite a bit but even so I may have to do 10s > of thousands of string compares if there are only 5 choices right? > > Ideally wouldn't it be better to store an integer field in the tables, > and then keep a seperate small map table? Then the application could > use the map table to look up the key and then do a query on the large > table using only integer compares? > > Am I just being silly or am I not understanding something here? Maybe > there is another way to do this? What you've described are the two common ways to approach this situation. To accurately know which will be performant in your situation, I think you'll need to run benchmarks on your system. I don't know whether any such comparison has been done as a reference. (Yours could be a good one :) Perhaps someone else out there might have experience with this or knowledge of the backend and chime in. Michael Glaesemann grzm myrealbox com
"Morgan Kita" <mkita@verseon.com> writes: > However, I am a little concerned about performance in that case. If I > use domain constraints and keep the choices as strings then a string > comparison will be done whenver I query on this field right? Hm? A domain constraint would cost cycles when storing into the table, but not when querying it. Joining to another table, on the other hand, will cost you during queries. If the strings are long, so that the amount of extra space involved adds up to a lot, it'd probably make sense to go with integer codes. But if they're just a word or so then I'd lean to keeping it simple. regards, tom lane
Hmm still I wonder, won't the varchar/char compares be much slower than using a seperate map table, grabbing the int value,and then only doing int compares? Maybe somebody can enlighten me on the relative speed of queries involving stringcompares vs queries on int compares. -----Original Message----- From: pgsql-novice-owner@postgresql.org [mailto:pgsql-novice-owner@postgresql.org]On Behalf Of Tom Lane Sent: Thursday, March 17, 2005 6:58 PM To: Morgan Kita Cc: pgsql-novice@postgresql.org Subject: Re: [NOVICE] Question on simulating Enum Data type "Morgan Kita" <mkita@verseon.com> writes: > However, I am a little concerned about performance in that case. If I > use domain constraints and keep the choices as strings then a string > comparison will be done whenver I query on this field right? Hm? A domain constraint would cost cycles when storing into the table, but not when querying it. Joining to another table, on the other hand, will cost you during queries. If the strings are long, so that the amount of extra space involved adds up to a lot, it'd probably make sense to go with integer codes. But if they're just a word or so then I'd lean to keeping it simple.
"Morgan Kita" <mkita@verseon.com> writes: > Hmm still I wonder, won't the varchar/char compares be much slower > than using a seperate map table, grabbing the int value, and then only > doing int compares? Certainly a string compare is slower than an integer compare, but you have to consider the context. In database work what usually counts more than any CPU effort is the amount of disk I/O --- and that means that adding a table join to avoid a string compare is being penny-wise and pound-foolish. It'd be well worth your while to set up a few experimental comparisons to see how this plays out in practice. Either answer *could* be right depending on your situation. regards, tom lane