Thread: Poor Plan selected w/ not provided a date/time but selecting date/time from a table

Query2 is way faster mainly because the plan does not choose a seq scan
on a table w/ >20million rows.
The only difference between Query 1 and query 2 is that the
audio_key_dtime is chosen from a table rather than provided on the
query.

I'm not sure why this is the case and why it chooses such plans.
(should I be posting to pg-performance?)


Query1
=====

select
foo,
bar,
foobar
from
A,
join B
on a.id = b.id
join C
on c.id = b.id
and c.start_dtime = b.start_dtime
where audit_key_dtime >= (select last_refreshed from denorm_log where tablename = 'zon')
and   audit_key_dtime <  (select last_refreshed + refresh_interval from denorm_log where tablename = 'zon')

Query2
=====

select
foo,
bar,
foobar
from
A,
join B
on a.id = b.id
join C
on c.id = b.id
and c.start_dtime = b.start_dtime
where audit_key_dtime >= '2007-08-08 18:00:00'
and   audit_key_dtime <  '2007-08-08 18:01:00'


structure of denorm_log (very small table, max 10 rows)
=======================

                        Table "xmms.denorm_log"
         Column          |            Type             |   Modifiers
-------------------------+-----------------------------+---------------
 table_name              | text                        | not null
 denorm_table_name       | text                        | not null
 last_refreshed          | timestamp without time zone |
 refresh_interval        | interval                    |
 record_update_date_time | timestamp without time zone | default now()
 sql_delete              | text                        |
 sql_insert              | text                        |
Indexes:
    "denorm_log_pkey" PRIMARY KEY, btree (table_name)



Plan 1
======

HashAggregate  (cost=478784.63..478784.92 rows=1 width=139) (actual time=60593.583..60601.532 rows=82 loops=1)
  InitPlan
    ->  Index Scan using denorm_log_pkey on denorm_log  (cost=0.00..8.27 rows=1 width=8) (actual time=0.040..0.045
rows=1loops=1) 
          Index Cond: (table_name = 'zon'::text)
    ->  Index Scan using denorm_log_pkey on denorm_log  (cost=0.00..8.27 rows=1 width=24) (actual time=0.015..0.016
rows=1loops=1) 
          Index Cond: (table_name = 'zon'::text)
  ->  Nested Loop  (cost=387334.78..478768.01 rows=1 width=139) (actual time=46024.153..59479.035 rows=1545 loops=1)
        ->  Hash Join  (cost=387334.78..478759.22 rows=1 width=109) (actual time=45999.776..58651.991 rows=1545
loops=1)
              Hash Cond: (((trz.id)::text = (ts.id)::text) AND (trz.start_dtime = ts.start_dtime))
              ->  Index Scan using idx_trz_uptime on zon trz  (cost=0.00..3252.87 rows=65008 width=54) (actual
time=25.905..240.211rows=1545 loops=1) 
                    Index Cond: ((audit_key_dtime >= $0) AND (audit_key_dtime < $1))
                    Filter: (pber_err_rate_hd_zn_2 <> 0::numeric)
              ->  Hash  (cost=242589.91..242589.91 rows=5518191 width=63) (actual time=45873.643..45873.643
rows=5518191loops=1) 
                    ->  Seq Scan on ts  (cost=0.00..242589.91 rows=5518191 width=63) (actual time=355.084..28490.120
rows=5518191loops=1) 
        ->  Index Scan using d_pkey on drive d  (cost=0.00..8.70 rows=1 width=62) (actual time=0.506..0.510 rows=1
loops=1545)
              Index Cond: ((d.id)::text = (trz.id)::text)
Total runtime: 60607.891 ms


Plan 2 - No Seq Scans
======

HashAggregate  (cost=28.16..28.46 rows=1 width=139) (actual time=2240.899..2248.964 rows=82 loops=1)
  ->  Nested Loop  (cost=0.00..28.08 rows=1 width=139) (actual time=485.178..1175.938 rows=1545 loops=1)
        ->  Nested Loop  (cost=0.00..19.29 rows=1 width=109) (actual time=485.032..1125.322 rows=1545 loops=1)
              ->  Index Scan using idx_trz_uptime on zon trz  (cost=0.00..9.70 rows=1 width=54) (actual
time=24.481..33.667rows=1545 loops=1) 
                    Index Cond: ((audit_key_dtime >= '2007-08-08 18:00:00'::timestamp without time zone) AND
(audit_key_dtime< '2007-08-08 18:01:00'::timestamp without time zone)) 
                    Filter: (web_id <> 0::numeric)
              ->  Index Scan using idx_ts_sn_sdate_ttype on ts  (cost=0.00..9.58 rows=1 width=63) (actual
time=0.696..0.698rows=1 loops=1545) 
                    Index Cond: (((ts.id)::text = (trz.id)::text) AND (ts.start_dtime = trz.start_dtime))
        ->  Index Scan using d_pkey on  d  (cost=0.00..8.70 rows=1 width=62) (actual time=0.012..0.015 rows=1
loops=1545)
              Index Cond: ((d.id)::text = (trz.id)::text)
Total runtime: 2250.075 ms

Ow Mun Heng wrote:
> Query2 is way faster mainly because the plan does not choose a seq scan
> on a table w/ >20million rows.
> The only difference between Query 1 and query 2 is that the
> audio_key_dtime is chosen from a table rather than provided on the
> query.
>
> I'm not sure why this is the case and why it chooses such plans.
> (should I be posting to pg-performance?)

Your query plans don't seem to match your queries. That makes it
difficult to provide meaningful advice.


--
   Richard Huxton
   Archonet Ltd

On Wed, 2007-10-17 at 20:37 +0100, Richard Huxton wrote:
> Ow Mun Heng wrote:
> > Query2 is way faster mainly because the plan does not choose a seq scan
> > on a table w/ >20million rows.
> > The only difference between Query 1 and query 2 is that the
> > audio_key_dtime is chosen from a table rather than provided on the
> > query.
> >
> > I'm not sure why this is the case and why it chooses such plans.
> > (should I be posting to pg-performance?)
>
> Your query plans don't seem to match your queries. That makes it
> difficult to provide meaningful advice.
>
>

Well, then that makes both you and me(both) stumped. because the 2
queries are exactly the same except for the data part.


Index Cond: ((audit_key_dtime >= $0) AND (audit_key_dtime < $1))


Index Cond: ((audit_key_dtime >= '2007-08-08 18:00:00'::timestamp without time zone)
          AND (audit_key_dtime < '2007-08-08 18:01:00'::timestamp without time zone))

This is _the_ only difference between the 2 queries where on one, the
dates are provided, and the other is selected from a table.

I have no idea why the plans are so different between the two.




Ow Mun Heng wrote:

> Index Cond: ((audit_key_dtime >= $0) AND (audit_key_dtime < $1))
>
>
> Index Cond: ((audit_key_dtime >= '2007-08-08 18:00:00'::timestamp without time zone)
>           AND (audit_key_dtime < '2007-08-08 18:01:00'::timestamp without time zone))
>
> This is _the_ only difference between the 2 queries where on one, the
> dates are provided, and the other is selected from a table.
>
> I have no idea why the plans are so different between the two.

The difference is that it has to consider the worst possibility in the
second case, whereas the other one knows the interval is just one minute.

I didn't read the entire thread but I've seen similar things go much
better when you grab the constants beforehand and interpolate them into
the query by yourself.  Yes, it's really ugly.

--
Alvaro Herrera                         http://www.flickr.com/photos/alvherre/
"Crear es tan difícil como ser libre" (Elsa Triolet)

On Wed, 2007-10-17 at 22:47 -0300, Alvaro Herrera wrote:
> Ow Mun Heng wrote:
>
> > Index Cond: ((audit_key_dtime >= $0) AND (audit_key_dtime < $1))
> >
> >
> > Index Cond: ((audit_key_dtime >= '2007-08-08 18:00:00'::timestamp without time zone)
> >           AND (audit_key_dtime < '2007-08-08 18:01:00'::timestamp without time zone))
> >
> > This is _the_ only difference between the 2 queries where on one, the
> > dates are provided, and the other is selected from a table.
> >
> > I have no idea why the plans are so different between the two.
>
> The difference is that it has to consider the worst possibility in the
> second case, whereas the other one knows the interval is just one minute.

Actually the dates are just 1 min apart in both cases.

>
> I didn't read the entire thread but I've seen similar things go much
> better when you grab the constants beforehand and interpolate them into
> the query by yourself.  Yes, it's really ugly.
>

Yeah.. In the end, I just made a hack of it and did sort of like a
dymanic sql. (it is a dynamic sql anyway) where I selected the dates
into a variable (the whole query is encapsulated as a function ran at x
intervals) and then use

query_string := replace(query_string,'fromdate',date_inputed_into_variable)
query_string := replace(query_string,'todate',date_inputed_into_variable+refresh_interval)

and that got me much better performance.


Ow Mun Heng wrote:
> On Wed, 2007-10-17 at 20:37 +0100, Richard Huxton wrote:
>> Your query plans don't seem to match your queries. That makes it
>> difficult to provide meaningful advice.
>
> Well, then that makes both you and me(both) stumped. because the 2
> queries are exactly the same except for the data part.

Your queries were like this:

 > select
 > foo,
 > bar,
 > foobar
 > from
 > A,
 > join B
 > on a.id = b.id
 > join C
 > on c.id = b.id
 > and c.start_dtime = b.start_dtime
 > where audit_key_dtime >= '2007-08-08 18:00:00'
 > and   audit_key_dtime <  '2007-08-08 18:01:00'


Your plans contained these:

 >                     Filter: (pber_err_rate_hd_zn_2 <> 0::numeric)

 >                     Filter: (web_id <> 0::numeric)

I don't see the <> 0 condition anywhere in your SQL - presumably there's
a view involved somehwere?

--
   Richard Huxton
   Archonet Ltd

On Thu, 2007-10-18 at 07:28 +0100, Richard Huxton wrote:
> Ow Mun Heng wrote:
> > On Wed, 2007-10-17 at 20:37 +0100, Richard Huxton wrote:
> >> Your query plans don't seem to match your queries. That makes it
> >> difficult to provide meaningful advice.
> >
> > Well, then that makes both you and me(both) stumped. because the 2
> > queries are exactly the same except for the data part.
>
> Your queries were like this:
>
>  > select
>  > foo,
>  > bar,
>  > foobar
>  > from
>  > A,
>  > join B
>  > on a.id = b.id
>  > join C
>  > on c.id = b.id
>  > and c.start_dtime = b.start_dtime
>  > where audit_key_dtime >= '2007-08-08 18:00:00'
>  > and   audit_key_dtime <  '2007-08-08 18:01:00'
and web_id <> 0


>
>
> Your plans contained these:
>
>  >                     Filter: (pber_err_rate_hd_zn_2 <> 0::numeric)
>
>  >                     Filter: (web_id <> 0::numeric)
>
> I don't see the <> 0 condition anywhere in your SQL - presumably there's
> a view involved somehwere?


My apologies, in an effort to disguise the column_names, I missed out
the 1st one.

There is no View it's just a filtering condition. Having said that, the
only difference between the 2 queries are just the dates. (or the way
the dates are provided to the query..)


Ow Mun Heng wrote:
>
> On Wed, 2007-10-17 at 22:47 -0300, Alvaro Herrera wrote:
> > Ow Mun Heng wrote:
> >
> > > Index Cond: ((audit_key_dtime >= $0) AND (audit_key_dtime < $1))
> > >
> > >
> > > Index Cond: ((audit_key_dtime >= '2007-08-08 18:00:00'::timestamp without time zone)
> > >           AND (audit_key_dtime < '2007-08-08 18:01:00'::timestamp without time zone))
> > >
> > > This is _the_ only difference between the 2 queries where on one, the
> > > dates are provided, and the other is selected from a table.
> > >
> > > I have no idea why the plans are so different between the two.
> >
> > The difference is that it has to consider the worst possibility in the
> > second case, whereas the other one knows the interval is just one minute.
>
> Actually the dates are just 1 min apart in both cases.

Of course.  My point is that the planner doesn't know that in the first
case.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.