Thread: Improving performance of a query
Hi,
Is there a way to improve the performance of the following query?
SELECT * FROM SSIRRA where
(YEAR = 2004 and CUSTOMER = 0000000004 and CODE = 00 and PART >= 00) or
(YEAR = 2004 and CUSTOMER = 0000000004 and CODE > 00) or
(YEAR = 2004 and CUSTOMER > 0000000004) or
(YEAR > 2004)
(YEAR = 2004 and CUSTOMER = 0000000004 and CODE = 00 and PART >= 00) or
(YEAR = 2004 and CUSTOMER = 0000000004 and CODE > 00) or
(YEAR = 2004 and CUSTOMER > 0000000004) or
(YEAR > 2004)
Thanks in advance!
Benkendorf
__________________________________________________
Converse com seus amigos em tempo real com o Yahoo! Messenger
http://br.download.yahoo.com/messenger/
On Sat, Sep 03, 2005 at 09:02:27PM +0000, Carlos Benkendorf wrote: > Is there a way to improve the performance of the following query? > > SELECT * FROM SSIRRA where > (YEAR = 2004 and CUSTOMER = 0000000004 and CODE = 00 and PART >= 00) or > (YEAR = 2004 and CUSTOMER = 0000000004 and CODE > 00) or > (YEAR = 2004 and CUSTOMER > 0000000004) or > (YEAR > 2004) Could you post the EXPLAIN ANALYZE output of the query? It might also be useful to see the EXPLAIN ANALYZE output for each of those WHERE conditions individually. Also, what are the table and index definitions? How many rows are in the table? What version of PostgreSQL are you using? Do you vacuum and analyze regularly? In simple tests in 8.0.3 with random data -- which almost certainly has a different distribution than yours -- I see about a 10% improvement with a multi-column index on (year, customer, code, part) over using single-column indexes on each of those columns. Various multi-column indexes on two or three of the columns gave worse performance than single-column indexes. Your results will probably vary, however. In my tests, 8.1beta1 with its bitmap scans was about 20-25% faster than 8.0.3 with single-column indexes and about 35% faster with a four-column index. 8.1beta1's use of a four-column index was about 45% faster than 8.0.3's use of single-column indexes. Don't trust these numbers too much, though -- I simply inserted 20,000 random records into a table and ran the above query. -- Michael Fuhr
Carlos wrote: SELECT * FROM SSIRRA where (YEAR = 2004 and CUSTOMER = 0000000004 and CODE = 00 and PART >= 00) or (YEAR = 2004 and CUSTOMER = 0000000004 and CODE > 00) or (YEAR = 2004 and CUSTOMER > 0000000004) or (YEAR > 2004) [snip] ah, the positional query. You can always rewrite this query in the following form: (YEAR >= 2004) and (YEAR = 2004 or CUSTOMER >= 0000000004) and (YEAR = 2004 or CUSTOMER = 0000000004 or CODE >= 00) and (YEAR = 2004 or CUSTOMER = 0000000004 or CODE = 00 or PART > 00) This is better because it will index scan using 'year' (not customer or part though). The true answer is to lobby for/develop proper row constructor support so you can just SELECT * FROM SSIRRA where (YEAR, CUSTOMER, CODE, PART) > (2004, 0000000004, 00, 00) this is designed to do what you are trying to do but currently doesn't work quite right. note: in all these queries, 'order by YEAR, CUSTOMER, CODE, PART' should probably be on the query. Other solution: use cursor/fetch or some type of materialized solution. Merlin
On Tue, 6 Sep 2005, Merlin Moncure wrote: > Carlos wrote: > SELECT * FROM SSIRRA where > (YEAR = 2004 and CUSTOMER = 0000000004 and CODE = 00 and PART >= 00) or > (YEAR = 2004 and CUSTOMER = 0000000004 and CODE > 00) or > (YEAR = 2004 and CUSTOMER > 0000000004) or > (YEAR > 2004) > [snip] > > ah, the positional query. You can always rewrite this query in the > following form: > > (YEAR >= 2004) and > (YEAR = 2004 or CUSTOMER >= 0000000004) and > (YEAR = 2004 or CUSTOMER = 0000000004 or CODE >= 00) and > (YEAR = 2004 or CUSTOMER = 0000000004 or CODE = 00 or PART > 00) Unless I'm not seeing something, I don't think that's a correct reformulation in general. If customer < 4 and year > 2004 the original clause would return true but the reformulation would return false since (year=2004 or customer >= 4) would be false.
> > Carlos wrote: > > SELECT * FROM SSIRRA where > > (YEAR = 2004 and CUSTOMER = 0000000004 and CODE = 00 and PART >= 00) or > > (YEAR = 2004 and CUSTOMER = 0000000004 and CODE > 00) or > > (YEAR = 2004 and CUSTOMER > 0000000004) or > > (YEAR > 2004) > > [snip] > > > > ah, the positional query. You can always rewrite this query in the > > following form: > > > > (YEAR >= 2004) and > > (YEAR = 2004 or CUSTOMER >= 0000000004) and > > (YEAR = 2004 or CUSTOMER = 0000000004 or CODE >= 00) and > > (YEAR = 2004 or CUSTOMER = 0000000004 or CODE = 00 or PART > 00) > > Unless I'm not seeing something, I don't think that's a correct > reformulation in general. If customer < 4 and year > 2004 the original > clause would return true but the reformulation would return false since > (year=2004 or customer >= 4) would be false. You are correct, you also have to exchange '=' with '>' to exchange 'and' with 'or'. Correct answer is: > > (YEAR >= 2004) and > > (YEAR > 2004 or CUSTOMER >= 0000000004) and > > (YEAR > 2004 or CUSTOMER > 0000000004 or CODE >= 00) and > > (YEAR > 2004 or CUSTOMER > 0000000004 or CODE > 00 or PART > 00) It's easy to get tripped up here: the basic problem is how to get the next record based on a multi part key. My ISAM bridge can write them either way but the 'and' major form is always faster ;). MErlin