New Optimizer Behaviour In 7.0b1 - Mailing list pgsql-sql

From Mark Kirkwood
Subject New Optimizer Behaviour In 7.0b1
Date
Msg-id 38B5F103.68A06D0@ihug.co.nz
Whole thread Raw
List pgsql-sql
I have been playing a bit with this new release. There are lots of new
possible plans, which is really great.

I have been using the query shown below to study optimizer changes.

select      d0.d0f1,      count(f.f1)
from dim0 d0,    fact1 f
where d0.d0key = f.d0key
and   d0.d0f1 between '1999-11-01' and '1999-12-01'
group by d0.d0f1

Table setup is :
Table "fact1"  size  300000 rowsAttribute |  Type   | Modifier
-----------+---------+----------d0key     | integer |d1key     | integer |f1           | integer |
Index: fact1_q1 on d0key
Table "dim0"  size  900 rowsAttribute |    Type     | Modifier
-----------+-------------+----------d0key    | integer        |d0f1      | timestamp   |d0f2      | varchar(20) |d0f3
  | varchar(20) |
 
Indices: dim0_pk on d0key,             dim0_q1 on d0f1

Explain is :
Aggregate  (cost=12205.78..12372.44 rows=3333 width=20) ->  Group  (cost=12205.78..12289.11 rows=33333 width=20)
-> Sort  (cost=12205.78..12205.78 rows=33333 width=20)             ->  Hash Join  (cost=21.75..9371.33 rows=33333
width=20)                  ->  Seq Scan on fact1 f  (cost=0.00..4765.00
 
rows=300000 width=8)                   ->  Hash  (cost=21.50..21.50 rows=100 width=12)                         ->  Seq
Scanon dim0 d0  (cost=0.00..21.50
 
rows=100 width=12)

Initially this ran fairly slowly : 8-10s , the query scans about 9000
out the 300000 in the big table(fact1).
A bit of tweeking with the set variables : ( these are new -see
src/backend/commands/variable.c )
set cpu_tuple_cost = '0.6';
set enable_hashjoin = 'off';
set enable_mergejoin = 'off';

gave a new plan :
Aggregate  (cost=0.00..18476945.83 rows=3333 width=20) ->  Group  (cost=0.00..18476862.50 rows=33333 width=20)       ->
Nested Loop  (cost=0.00..18476779.16 rows=33333 width=20)             ->  Index Scan using dim0_q1 on dim0 d0
(cost=0.00..81.98
rows=100 width=12)             ->  Index Scan using fact1_q1 on fact1 f
(cost=0.00..4016.97 rows=1500 width=8)

which is devestatiingly fast... about 1 s. Note that the table order is
reversed and that the index on the big
table ( fact1) is used.

However it seems a bit on the brutal side to have to coerce the
optimizer this way ( after all hash joins are
generally good), is there any way to get a reasonably sensible use of
indexes without such desperate
measures ?

P.s : I realize that this is beta 1..... I am impressed, I have had no
problems relinking php4 and  subsequently apache for use with this
release - seems like a very good quality beta 1. well done guys!

Mark
(markir@ihug.co.nz,mark.kirkwood@hnz.co.nz )



pgsql-sql by date:

Previous
From: Tom Lane
Date:
Subject: Re: [SQL] date_part, too many results?
Next
From: Tom Lane
Date:
Subject: Re: [INTERFACES] arrays of varchar and "