Thread: New Optimizer Behaviour In 7.0b1

New Optimizer Behaviour In 7.0b1

From
Mark Kirkwood
Date:
I have been playing a bit with this new release. There are lots of new
possible plans, which is really great.

I have been using the query shown below to study optimizer changes.

select      d0.d0f1,      count(f.f1)
from dim0 d0,    fact1 f
where d0.d0key = f.d0key
and   d0.d0f1 between '1999-11-01' and '1999-12-01'
group by d0.d0f1

Table setup is :
Table "fact1"  size  300000 rowsAttribute |  Type   | Modifier
-----------+---------+----------d0key     | integer |d1key     | integer |f1           | integer |
Index: fact1_q1 on d0key
Table "dim0"  size  900 rowsAttribute |    Type     | Modifier
-----------+-------------+----------d0key    | integer        |d0f1      | timestamp   |d0f2      | varchar(20) |d0f3
  | varchar(20) |
 
Indices: dim0_pk on d0key,             dim0_q1 on d0f1

Explain is :
Aggregate  (cost=12205.78..12372.44 rows=3333 width=20) ->  Group  (cost=12205.78..12289.11 rows=33333 width=20)
-> Sort  (cost=12205.78..12205.78 rows=33333 width=20)             ->  Hash Join  (cost=21.75..9371.33 rows=33333
width=20)                  ->  Seq Scan on fact1 f  (cost=0.00..4765.00
 
rows=300000 width=8)                   ->  Hash  (cost=21.50..21.50 rows=100 width=12)                         ->  Seq
Scanon dim0 d0  (cost=0.00..21.50
 
rows=100 width=12)

Initially this ran fairly slowly : 8-10s , the query scans about 9000
out the 300000 in the big table(fact1).
A bit of tweeking with the set variables : ( these are new -see
src/backend/commands/variable.c )
set cpu_tuple_cost = '0.6';
set enable_hashjoin = 'off';
set enable_mergejoin = 'off';

gave a new plan :
Aggregate  (cost=0.00..18476945.83 rows=3333 width=20) ->  Group  (cost=0.00..18476862.50 rows=33333 width=20)       ->
Nested Loop  (cost=0.00..18476779.16 rows=33333 width=20)             ->  Index Scan using dim0_q1 on dim0 d0
(cost=0.00..81.98
rows=100 width=12)             ->  Index Scan using fact1_q1 on fact1 f
(cost=0.00..4016.97 rows=1500 width=8)

which is devestatiingly fast... about 1 s. Note that the table order is
reversed and that the index on the big
table ( fact1) is used.

However it seems a bit on the brutal side to have to coerce the
optimizer this way ( after all hash joins are
generally good), is there any way to get a reasonably sensible use of
indexes without such desperate
measures ?

P.s : I realize that this is beta 1..... I am impressed, I have had no
problems relinking php4 and  subsequently apache for use with this
release - seems like a very good quality beta 1. well done guys!

Mark
(markir@ihug.co.nz,mark.kirkwood@hnz.co.nz )

P.p.s : hopefully this is not going to appear twice on this list, my
first send bounced.


Re: [SQL] New Optimizer Behaviour In 7.0b1

From
Tom Lane
Date:
Mark Kirkwood <markir@ihug.co.nz> writes:
> I have been using the query shown below to study optimizer changes.

> [ results snipped ]

> However it seems a bit on the brutal side to have to coerce the
> optimizer this way ( after all hash joins are generally good), is
> there any way to get a reasonably sensible use of indexes without such
> desperate measures ?

Obviously we'd like to get the optimizer to do the right thing without
being beaten over the head ;-).  As you see, we're not there yet.

I will be the first to say that the 7.0 optimizer is still in a pretty
crude state: I have made drastic changes to its model of plan costs,
and have not yet had much time to tune the results.  I do appreciate
reports about cases it gets wrong.

In this case I guess the first question to ask is whether its
selectivity estimates are any good.  It seems to be estimating that your
"d0.d0f1 between '1999-11-01' and '1999-12-01'" clause will select about
100 of the 900 rows in dim0; is that anywhere near right?  Also, in the
nested-loop plan we can see that it thinks about 1500 rows from fact1
will match on "d0.d0key = f.d0key" against any given selected row from
dim0; is that on the mark?  Finally, is the estimate that the total
number of joined rows (before GROUP BY) is about 33333 any good?

If you have not done VACUUM ANALYZE recently on these two tables,
it'd be worth trying that to see if it brings the estimates any
closer to reality.
        regards, tom lane


Re: [SQL] New Optimizer Behaviour In 7.0b1

From
Mark Kirkwood
Date:
Tom Lane wrote:

> In this case I guess the first question to ask is whether its
> selectivity estimates are any good.  It seems to be estimating that your
> "d0.d0f1 between '1999-11-01' and '1999-12-01'" clause will select about
> 100 of the 900 rows in dim0; is that anywhere near right?  Also, in the
> nested-loop plan we can see that it thinks about 1500 rows from fact1
> will match on "d0.d0key = f.d0key" against any given selected row from
> dim0; is that on the mark?  Finally, is the estimate that the total
> number of joined rows (before GROUP BY) is about 33333 any good?
>
> If you have not done VACUUM ANALYZE recently on these two tables,
> it'd be worth trying that to see if it brings the estimates any
> closer to reality.
>
>                         regards, tom lane

Tom,

Here is the row data for comparison with the selectivity estimates:

select count(*) from dim0 d0 where d0.d0f1 between '1999-11-01' and
'1999-12-01'               31 rows
select count(*) from fact1 where d0key  =
<value>
3000 rows
total number of joined rows before group
by
9000 rows
( i.e : there are only 3 distinct d0key values in fact1 for the "month" ,
and each one has 3000 rows )

It looks like the estimate on the big table ( fact1 ) are right order of
magnitude, but the small table ( dim0 ) ones are too high (and presumably )
throwing the rest off

I did a vacuum analyze of these tables again, just in case....( no change
to the plans)

Cheers

Mark



Re: [SQL] New Optimizer Behaviour In 7.0b1

From
Tom Lane
Date:
Mark Kirkwood <markir@ihug.co.nz> writes:
> It looks like the estimate on the big table ( fact1 ) are right order of
> magnitude, but the small table ( dim0 ) ones are too high (and presumably )
> throwing the rest off

At least part of the problem is that the selectivity code doesn't know
how to deal with date/time datatypes :-(.  I was waiting for Thomas to
commit the great date/time reorganization before I wrote that code.
But he's done now, so I've gone in and fixed convert_to_scalar to know
about all the surviving date/time types.

If you like, you can pick up the updated version of 
src/backend/utils/adt/selfuncs.c from the CVS server, or grab a
snapshot tarball dated later than this message, and see whether
the selectivity estimates get any better.
        regards, tom lane


Re: [SQL] New Optimizer Behaviour In 7.0b1

From
Mark Kirkwood
Date:
Tom Lane wrote:

>
> If you like, you can pick up the updated version of
> src/backend/utils/adt/selfuncs.c from the CVS server, or grab a
> snapshot tarball dated later than this message, and see whether
> the selectivity estimates get any better.
>
>                         regards, tom lane

I will give one of those a go....

Cheers

Mark



RE: DATE Formating Problem Again

From
Mark Kirkwood
Date:
>I am searching after the date formatting solution, but didn't found a
>correct one.
>
>So example ORACLE has
>    - to_char function to make any kind of date format from a date
type:
>to_char(now,'yyyy.mm.dd')
>    - to_date to makes a date type from any kind of string: to_date
>('1999.11.23','yyyy.mm.dd')
>(sniped...)

I have wrestled a bit with this myself - being an Oracle dba ( but i'm
alright now....),
the date <-> char conversion is very different is Postgresql.

However I have found that the "date_part" and "interval" functions seem
to do the job ok,
sometimes combined with the casting operator  "::datetime".

e.g : ( for release 7.0 )
date -> number

db1=> select date_part('year',now());date_part
-----------     2000

and  char -> interval
db1=> select interval('4 hours'::interval);?column?
----------04:00

and char -> datetime via interval
db1=> select ('1999-01-01 '||interval('4 hours'::interval))::datetime;       ?column?
------------------------1999-01-01 04:00:00+13

I hope this helps a bit...

Cheers

Mark



Re: [SQL] RE: DATE Formating Problem Again

From
Karel Zak - Zakkr
Date:


On Thu, 2 Mar 2000, Mark Kirkwood wrote:

> >I am searching after the date formatting solution, but didn't found a
> >correct one.
> >
> >So example ORACLE has
> >    - to_char function to make any kind of date format from a date
> type:
> >to_char(now,'yyyy.mm.dd')
> >    - to_date to makes a date type from any kind of string: to_date
> >('1999.11.23','yyyy.mm.dd')
> >(sniped...)
> 
> I have wrestled a bit with this myself - being an Oracle dba ( but i'm
> alright now....),
> the date <-> char conversion is very different is Postgresql.
> 
> However I have found that the "date_part" and "interval" functions seem
> to do the job ok,
> sometimes combined with the casting operator  "::datetime".
> 
> e.g : ( for release 7.0 )
> date -> number
The release 7.0 has to_char/to_date/to_timestamp functions too,
and it is very compatible with Oracle's to_char(). See the PostgreSQL
documentation for more details.
                    Karel