The planner hates me. - Mailing list pgsql-general

From Jeff Amiel
Subject The planner hates me.
Date
Msg-id C17A452040EDB84AA7A10AEA334E3E1455D1C1@AD1
Whole thread Raw
Responses Re: The planner hates me.
Re: The planner hates me.
Re: The planner hates me.
List pgsql-general
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC)
3.4.3 (csl-sol210-3_4-branch+sol_rpath)"
(test environment)

Picture a table called 'transaction' with 1 million rows.
most (> 99% of the records have date1 and date2 values in the past
(spread over 4 years)
99.99% of the records have a state OTHER than 'I'.


CREATE TABLE transaction
(
  amount numeric(10,2) NOT NULL,
  date1 date NOT NULL,
  state character(1) NOT NULL,
  date2 date DEFAULT date(now())
)

CREATE INDEX t_date1_index
  ON transaction
  USING btree
  (date1);

CREATE INDEX t_date2_index
  ON transaction
  USING btree
  (date2);

CREATE INDEX t_state_index
  ON transaction
  USING btree
  (state);



explain analyze
     select sum(amount),  dates.date as date
        from transaction t
         join (select get_dates as date from
get_dates('09/17/08','09/24/08')) dates on
        (t.state='I' or   t.date1 >=  dates.date)
           group by dates.date

get_dates simply returns each date between (and including) the passed
dates ...and is a 'stable' function).  Yes...I know I probably could
have used a generate_series or something...but this was written before I
knew anything about that

"HashAggregate  (cost=1290485.15..1290485.40 rows=200 width=17) (actual
time=277.804..277.809 rows=8 loops=1)"
"  ->  Nested Loop  (cost=270.37..1123134.88 rows=334700533 width=17)
(actual time=3.182..153.741 rows=120536 loops=1)"
"        ->  Function Scan on get_dates  (cost=0.00..1.25 rows=1000
width=4) (actual time=0.057..0.065 rows=8 loops=1)"
"        ->  Bitmap Heap Scan on "transaction" t  (cost=270.37..618.60
rows=336357 width=22) (actual time=3.093..10.958 rows=15067 loops=8)"
"              Recheck Cond: ((t.state = 'I'::bpchar) OR (t.date1 >=
get_dates.get_dates))"
"              ->  BitmapOr  (cost=270.37..270.37 rows=336357 width=0)
(actual time=2.853..2.853 rows=0 loops=8)"
"                    ->  Bitmap Index Scan on t_state_index
(cost=0.00..8.97 rows=4971 width=0) (actual time=2.842..2.842 rows=15067
loops=8)"
"                          Index Cond: (state = 'I'::bpchar)"
"                    ->  Bitmap Index Scan on t_date1_index
(cost=0.00..252.99 rows=331387 width=0) (actual time=0.009..0.009 rows=2
loops=8)"
"                          Index Cond: (t.date1 >= get_dates.get_dates)"
"Total runtime: 277.883 ms"


Uses bitmap scans for optimization and performs admirably.

BUT...when I change the query thusly... (adding in an addition AND
clause)

explain analyze
     select sum(amount),  dates.date as date
        from transaction t
         join (select get_dates as date from
get_dates('09/17/08','09/24/08')) dates on
        (t.state='I' or   t.date1 >=  dates.date)    and t.date2
< dates.date
           group by dates.date

"HashAggregate  (cost=1222618.09..1222618.34 rows=200 width=15) (actual
time=7538.193..7538.199 rows=8 loops=1)"
"  ->  Nested Loop  (cost=0.00..1166174.15 rows=112887885 width=15)
(actual time=0.889..7411.997 rows=120522 loops=1)"
"        Join Filter: ((t.state = 'I'::bpchar) OR (t.date1 >=
get_dates.get_dates))"
"        ->  Function Scan on get_dates  (cost=0.00..1.25 rows=1000
width=4) (actual time=0.055..0.062 rows=8 loops=1)"
"        ->  Index Scan using t_date2_index on "transaction" t
(cost=0.00..590.77 rows=328800 width=24) (actual time=0.018..492.348
rows=986273 loops=8)"
"              Index Cond: (t.date2 < get_dates.get_dates)"
"Total runtime: 7538.259 ms"

I am pulling my hair out.  Why does it insist on using the index on the
date2 field?   It was doing so well with the bitmap indexing on the
other fields...if the planner would simply do the same with date2, all
would be well.  I really need The date2 portion of the query to filter
AFTER the first two clauses, because those are quicker and narrows the
data down to a manageable size....


Any suggestions on query or index changes?  I still need the date2 index
for other queries....but the fact that it gets used exclusively for this
query drives me batty!.

I've tried various combinations of multi-column indexes to no
avail...because I still need the date2 index for other reasons, this
query ALWAYS chooses it for some reason.


Any help would be appreciated.







pgsql-general by date:

Previous
From: Louis-David Mitterrand
Date:
Subject: group by error message?
Next
From: Justin Yao
Date:
Subject: how can I find out the numeric directory name of each database in PostgreSQL 8.3