Thread: timestamp query doesn't use an index ...

timestamp query doesn't use an index ...

From
"Marc G. Fournier"
Date:
I'm trying to figure out some way to speed up the following query:
  select ps2.page_id, ps2.template_component_id, max(ps2.start_time)    from page_schedule ps2   where ps2.start_time
attime zone 'MST7MDT' <= '2006-5-17 8:9:18'
 
group by ps2.page_id, ps2.template_component_id

When run through EXPLAIN ANALYZE, it shows:
                                                          QUERY PLAN 

-------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=2613.28..2614.17 rows=72 width=16) (actual time=976.629..976.938 rows=128 loops=1)   ->  Seq Scan
onpage_schedule ps2  (cost=0.00..2364.95 rows=33110 width=16) (actual time=0.021..623.363 rows=94798 loops=1)
Filter:(timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time zone) Total runtime:
977.224ms
 
(4 rows)

I've tried doing a function index, like:

create index start_time_page_schedule          on page_schedule       using btree ( timezone('MST7MDT'::text,
start_time) );
 

But, same result ... whereas, if I change the <= to just =, the index is 
used, but that is expected ...

Is there some other way I can either write above query *or* do an index, 
such that it will use the index?

thanks ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664


Re: timestamp query doesn't use an index ...

From
Michael Glaesemann
Date:
On May 21, 2006, at 10:42 , Marc G. Fournier wrote:

>    ->  Seq Scan on page_schedule ps2  (cost=0.00..2364.95  
> rows=33110 width=16) (actual time=0.021..623.363 rows=94798 loops=1)

I don't know about rewriting the query, but it appears your  
statistics are a little out of date (e.g., rows expected/actual  
33110/94798). Does running ANALYZE help?


Michael Glaesemann
grzm seespotcode net





Re: timestamp query doesn't use an index ...

From
"Marc G. Fournier"
Date:
On Sun, 21 May 2006, Michael Glaesemann wrote:

>
> On May 21, 2006, at 10:42 , Marc G. Fournier wrote:
>
>>   ->  Seq Scan on page_schedule ps2  (cost=0.00..2364.95 rows=33110 
>> width=16) (actual time=0.021..623.363 rows=94798 loops=1)
>
> I don't know about rewriting the query, but it appears your statistics are a 
> little out of date (e.g., rows expected/actual 33110/94798). Does running 
> ANALYZE help?

the data is idle, just loaded it on my desktop for testing purposes ... 
being paranoid, I have been doing a vacuum analyze on the table as I 
change the index's *just in case*, but, doing a full analyze on the whole 
database doesn't change the results any:

Actually, the above results are odd anyway, since a second run of the 
exact same query, shows more normal numbers:
                                                          QUERY PLAN 

-------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=3051.91..3054.19 rows=183 width=16) (actual time=1030.970..1031.257 rows=128 loops=1)   ->  Seq
Scanon page_schedule ps2  (cost=0.00..2364.95 rows=91594 width=16) (actual time=0.019..636.599 rows=94798 loops=1)
  Filter: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time zone) Total runtime:
1031.681ms
 
(4 rows)

So not 100% certain where the 33110/94798 gap came from ;)

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664


Re: timestamp query doesn't use an index ...

From
Tom Lane
Date:
"Marc G. Fournier" <scrappy@postgresql.org> writes:
> I'm trying to figure out some way to speed up the following query:

>    select ps2.page_id, ps2.template_component_id, max(ps2.start_time)
>      from page_schedule ps2
>     where ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18'
> group by ps2.page_id, ps2.template_component_id

> Is there some other way I can either write above query *or* do an index, 
> such that it will use the index?

One-sided inequalities frequently *shouldn't* use an index, because
they're retrieving too much of the table.  Are you sure this is fetching
only a small fraction of the table?  Are you using PG 8.1 (8.1 would be
likely to try to use a bitmap indexscan for this)?

You could experiment with enable_seqscan = off to see if the planner is
actually wrong about its choice.  If so, reducing random_page_cost might
be the best permanent solution.
        regards, tom lane


Re: timestamp query doesn't use an index ...

From
"Marc G. Fournier"
Date:
On Sun, 21 May 2006, Tom Lane wrote:

> "Marc G. Fournier" <scrappy@postgresql.org> writes:
>> I'm trying to figure out some way to speed up the following query:
>
>>    select ps2.page_id, ps2.template_component_id, max(ps2.start_time)
>>      from page_schedule ps2
>>     where ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18'
>> group by ps2.page_id, ps2.template_component_id
>
>> Is there some other way I can either write above query *or* do an index,
>> such that it will use the index?
>
> One-sided inequalities frequently *shouldn't* use an index, because
> they're retrieving too much of the table.  Are you sure this is fetching
> only a small fraction of the table?  Are you using PG 8.1 (8.1 would be
> likely to try to use a bitmap indexscan for this)?
>
> You could experiment with enable_seqscan = off to see if the planner is
> actually wrong about its choice.  If so, reducing random_page_cost might
> be the best permanent solution.

vrnprd=# select version();                                            version 
------------------------------------------------------------------------------------------------ PostgreSQL 8.1.3 on
i386-portbld-freebsd6.1,compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518
 
(1 row)

vrnprd=# set enable_seqscan = off;
SET
vrnprd=# explain analyze  select ps2.page_id, ps2.template_component_id, max(ps2.start_time)    from page_schedule ps2
where ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18'
 
group by ps2.page_id, ps2.template_component_id
;                                                                     QUERY PLAN 

-----------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=3509.96..3513.50 rows=283 width=16) (actual time=839.460..839.769 rows=128 loops=1)   ->  Bitmap
HeapScan on page_schedule ps2  (cost=573.65..2822.86 rows=91614 width=16) (actual time=149.788..505.438 rows=94798
loops=1)        Recheck Cond: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time
zone)        ->  Bitmap Index Scan on start_time_page_schedule  (cost=0.00..573.65 rows=91614 width=0) (actual
time=127.761..127.761rows=94798 loops=1)               Index Cond: (timezone('MST7MDT'::text, start_time) <=
'2006-05-1708:09:18'::timestamp without time zone) Total runtime: 846.604 ms
 
(6 rows)

vrnprd=#

And yup, it is definitely returning just 128 rows out of the 93k or so:
     110 |                   419 | 2005-10-26 13:15:00-03     130 |                   215 | 2006-04-26 10:15:00-03
(128 rows)

And, not sure how far to reduce random_page_cost, but it was 4 ... tried 2 
and 1, and both come up with the same results ... with seqscan enabled, it 
does a seqscan :(

I suspected with the <= there wasn't going to be much I could do with 
this, but figured I'd make sure there wasn't something that I was 
overlooking :(

Thx ...


----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . scrappy@hub.org                              MSN . scrappy@hub.org
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664


Re: timestamp query doesn't use an index ...

From
Tom Lane
Date:
"Marc G. Fournier" <scrappy@postgresql.org> writes:
>     ->  Bitmap Heap Scan on page_schedule ps2  (cost=573.65..2822.86 rows=91614 width=16) (actual
time=149.788..505.438rows=94798 loops=1)
 
>           Recheck Cond: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time
zone)
>           ->  Bitmap Index Scan on start_time_page_schedule  (cost=0.00..573.65 rows=91614 width=0) (actual
time=127.761..127.761rows=94798 loops=1)
 
>                 Index Cond: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time
zone)

> And yup, it is definitely returning just 128 rows out of the 93k or so:

No, the scan is pulling 94798 rows from the table, according to the
EXPLAIN ANALYZE --- the number of resulting groups isn't much of a
factor here.

We don't currently have any index optimization for MIN/MAX in a GROUP BY
context, and even if we did, it wouldn't apply here: the planner
couldn't assume that the sort order of an index on "start_time at time
zone 'MST7MDT'" would have anything to do with the ordering of just
"start_time".  Is there a reason you're writingwhere ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18'
and notwhere ps2.start_time <= '2006-5-17 8:9:18' at time zone 'MST7MDT'
The latter seems less likely to have strange behaviors near DST
transitions.  I don't think it'll be any faster at the moment, but you
could at least save maintaining a specialized index.
        regards, tom lane