Thread: Q: Structured index - which one runs faster?
Hi all, sorry for reposting this to the lists, but I feel I posted this at the wrong time of day, since now a lot more of you gurus are reading, and I really need some knowledgeable input... thanks for consideration :) I have a question concerning table/key layout. I need to store an ID value that consists of three numerical elements: - ident1 char(5) - ident2 char(5) - nodeid int4 I need an index on these columns. Insert, delete, and lookup operations this in this need to be as fast as possible. Now I have two options: (a) creating an index on all three columns, or (b) create a single varchar column combining all three components into a single string, like "ident1:ident2:nodeid" and indexing this column only. There will be a couple of million rows in this table, the values in question are not unique. Which would be faster in your opinion? (a) or (b)? Thanks for any insight, -- >O Ernest E. Vogelsinger (\) ICQ #13394035 ^ http://www.vogelsinger.at/ ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly -- >O Ernest E. Vogelsinger (\) ICQ #13394035 ^ http://www.vogelsinger.at/
Ernest E Vogelsinger <ernest@vogelsinger.at> writes: > (a) creating an index on all three columns, or > (b) create a single varchar column combining all three components into a > single string, like "ident1:ident2:nodeid" and indexing this column only. I can't imagine that (b) is a good idea ... it's dubious that you are saving anything on the indexing, and you're sure adding a lot of space to the table, not to mention maintenance effort, potential for bugs, etc. It might be worth creating the index so that the "least non-unique" column is mentioned first, if there's a clear winner in those terms. That would minimize the number of times that comparisons have to look at the additional columns. regards, tom lane
At 00:53 23.05.2003, Tom Lane said: --------------------[snip]-------------------- >Ernest E Vogelsinger <ernest@vogelsinger.at> writes: >> (a) creating an index on all three columns, or >> (b) create a single varchar column combining all three components into a >> single string, like "ident1:ident2:nodeid" and indexing this column only. > >I can't imagine that (b) is a good idea ... it's dubious that you are >saving anything on the indexing, and you're sure adding a lot of space >to the table, not to mention maintenance effort, potential for bugs, >etc. > >It might be worth creating the index so that the "least non-unique" >column is mentioned first, if there's a clear winner in those terms. >That would minimize the number of times that comparisons have to look at >the additional columns. --------------------[snip]-------------------- Thanks for replying :) Do you know if there's a general performance difference between numeric (int4) and character (fixed-size char[5]) columns? The ident1 and ident2 columns are planned to be char[5], only the third column (with least precedence) will be numeric. The application is still in the design phase, so I still could fiddle around that and make that char[5] numeric with an additional mapping (@runtime, not in the DB) if this will increase performance. Thanks, -- >O Ernest E. Vogelsinger (\) ICQ #13394035 ^ http://www.vogelsinger.at/
Ernest E Vogelsinger <ernest@vogelsinger.at> writes: > Do you know if there's a general performance difference between numeric > (int4) and character (fixed-size char[5]) columns? The ident1 and ident2 > columns are planned to be char[5], only the third column (with least > precedence) will be numeric. int4 is certainly faster to compare than char(n), but I wouldn't contort your database design on that basis... if the idents aren't naturally integers, don't force them to be. regards, tom lane
Ernest- > (a) creating an index on all three columns, or > (b) create a single varchar column combining all three components into a > single string, like "ident1:ident2:nodeid" and indexing this column only. > > There will be a couple of million rows in this table, the values in > question are not unique. I'd go with (a). (b) is not very flexible (e.g., lookup by ident2 only), and any speed advantage will require knowing in advance the optimal key order (i1:i2:n v. n:i2:i1 v. ...). I'd expect it would be comparable to a multi-column index for speed. (a) can really be implemented in 3 ways: (a1) an index of all 3 columns (a2) an index on /each/ of 3 columns (a3) a multi-column index AND separate indices on the others. e.g., index (i1,i2,n), and index (i2) and index (n) The choice of which is fastest depends a lot on the distribution of keys in each column and whether you need to do lookups on only one or two columns. Again, once you choose (b), you're kinda stuck with treating the compound key as a single entity (without incurring a big performance hit); (a) will allow you to experiment with optimal indexing without affecting code. Since it sounds like you've already got the data loaded, I (probably others) would be interested in any timing runs you do. -Reece -- Reece Hart, Ph.D. rkh@gene.com, http://www.gene.com/ Genentech, Inc. 650/225-6133 (voice), -5389 (fax) Bioinformatics and Protein Engineering 1 DNA Way, MS-93 http://www.in-machina.com/~reece/ South San Francisco, CA 94080-4990 reece@in-machina.com, GPG: 0x25EC91A0
On Thu, 22 May 2003, Ernest E Vogelsinger wrote: > Hi all, > > sorry for reposting this to the lists, but I feel I posted this at the > wrong time of day, since now a lot more of you gurus are reading, and I > really need some knowledgeable input... thanks for consideration :) > > > I have a question concerning table/key layout. > > I need to store an ID value that consists of three numerical elements: > - ident1 char(5) > - ident2 char(5) > - nodeid int4 > > I need an index on these columns. Insert, delete, and lookup operations > this in this need to be as fast as possible. Now I have two options: > > (a) creating an index on all three columns, or > (b) create a single varchar column combining all three components into a > single string, like "ident1:ident2:nodeid" and indexing this column only. > > There will be a couple of million rows in this table, the values in > question are not unique. > > Which would be faster in your opinion? (a) or (b)? Generally speaking, b should be faster, but a should be more versatile.