The planner hates me. - Mailing list pgsql-general
From | Jeff Amiel |
---|---|
Subject | The planner hates me. |
Date | |
Msg-id | C17A452040EDB84AA7A10AEA334E3E1455D1C1@AD1 Whole thread Raw |
Responses |
Re: The planner hates me.
Re: The planner hates me. Re: The planner hates me. |
List | pgsql-general |
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)" (test environment) Picture a table called 'transaction' with 1 million rows. most (> 99% of the records have date1 and date2 values in the past (spread over 4 years) 99.99% of the records have a state OTHER than 'I'. CREATE TABLE transaction ( amount numeric(10,2) NOT NULL, date1 date NOT NULL, state character(1) NOT NULL, date2 date DEFAULT date(now()) ) CREATE INDEX t_date1_index ON transaction USING btree (date1); CREATE INDEX t_date2_index ON transaction USING btree (date2); CREATE INDEX t_state_index ON transaction USING btree (state); explain analyze select sum(amount), dates.date as date from transaction t join (select get_dates as date from get_dates('09/17/08','09/24/08')) dates on (t.state='I' or t.date1 >= dates.date) group by dates.date get_dates simply returns each date between (and including) the passed dates ...and is a 'stable' function). Yes...I know I probably could have used a generate_series or something...but this was written before I knew anything about that "HashAggregate (cost=1290485.15..1290485.40 rows=200 width=17) (actual time=277.804..277.809 rows=8 loops=1)" " -> Nested Loop (cost=270.37..1123134.88 rows=334700533 width=17) (actual time=3.182..153.741 rows=120536 loops=1)" " -> Function Scan on get_dates (cost=0.00..1.25 rows=1000 width=4) (actual time=0.057..0.065 rows=8 loops=1)" " -> Bitmap Heap Scan on "transaction" t (cost=270.37..618.60 rows=336357 width=22) (actual time=3.093..10.958 rows=15067 loops=8)" " Recheck Cond: ((t.state = 'I'::bpchar) OR (t.date1 >= get_dates.get_dates))" " -> BitmapOr (cost=270.37..270.37 rows=336357 width=0) (actual time=2.853..2.853 rows=0 loops=8)" " -> Bitmap Index Scan on t_state_index (cost=0.00..8.97 rows=4971 width=0) (actual time=2.842..2.842 rows=15067 loops=8)" " Index Cond: (state = 'I'::bpchar)" " -> Bitmap Index Scan on t_date1_index (cost=0.00..252.99 rows=331387 width=0) (actual time=0.009..0.009 rows=2 loops=8)" " Index Cond: (t.date1 >= get_dates.get_dates)" "Total runtime: 277.883 ms" Uses bitmap scans for optimization and performs admirably. BUT...when I change the query thusly... (adding in an addition AND clause) explain analyze select sum(amount), dates.date as date from transaction t join (select get_dates as date from get_dates('09/17/08','09/24/08')) dates on (t.state='I' or t.date1 >= dates.date) and t.date2 < dates.date group by dates.date "HashAggregate (cost=1222618.09..1222618.34 rows=200 width=15) (actual time=7538.193..7538.199 rows=8 loops=1)" " -> Nested Loop (cost=0.00..1166174.15 rows=112887885 width=15) (actual time=0.889..7411.997 rows=120522 loops=1)" " Join Filter: ((t.state = 'I'::bpchar) OR (t.date1 >= get_dates.get_dates))" " -> Function Scan on get_dates (cost=0.00..1.25 rows=1000 width=4) (actual time=0.055..0.062 rows=8 loops=1)" " -> Index Scan using t_date2_index on "transaction" t (cost=0.00..590.77 rows=328800 width=24) (actual time=0.018..492.348 rows=986273 loops=8)" " Index Cond: (t.date2 < get_dates.get_dates)" "Total runtime: 7538.259 ms" I am pulling my hair out. Why does it insist on using the index on the date2 field? It was doing so well with the bitmap indexing on the other fields...if the planner would simply do the same with date2, all would be well. I really need The date2 portion of the query to filter AFTER the first two clauses, because those are quicker and narrows the data down to a manageable size.... Any suggestions on query or index changes? I still need the date2 index for other queries....but the fact that it gets used exclusively for this query drives me batty!. I've tried various combinations of multi-column indexes to no avail...because I still need the date2 index for other reasons, this query ALWAYS chooses it for some reason. Any help would be appreciated.
pgsql-general by date: