Thread: timestamp query doesn't use an index ...
I'm trying to figure out some way to speed up the following query: select ps2.page_id, ps2.template_component_id, max(ps2.start_time) from page_schedule ps2 where ps2.start_time attime zone 'MST7MDT' <= '2006-5-17 8:9:18' group by ps2.page_id, ps2.template_component_id When run through EXPLAIN ANALYZE, it shows: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=2613.28..2614.17 rows=72 width=16) (actual time=976.629..976.938 rows=128 loops=1) -> Seq Scan onpage_schedule ps2 (cost=0.00..2364.95 rows=33110 width=16) (actual time=0.021..623.363 rows=94798 loops=1) Filter:(timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time zone) Total runtime: 977.224ms (4 rows) I've tried doing a function index, like: create index start_time_page_schedule on page_schedule using btree ( timezone('MST7MDT'::text, start_time) ); But, same result ... whereas, if I change the <= to just =, the index is used, but that is expected ... Is there some other way I can either write above query *or* do an index, such that it will use the index? thanks ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664
On May 21, 2006, at 10:42 , Marc G. Fournier wrote: > -> Seq Scan on page_schedule ps2 (cost=0.00..2364.95 > rows=33110 width=16) (actual time=0.021..623.363 rows=94798 loops=1) I don't know about rewriting the query, but it appears your statistics are a little out of date (e.g., rows expected/actual 33110/94798). Does running ANALYZE help? Michael Glaesemann grzm seespotcode net
On Sun, 21 May 2006, Michael Glaesemann wrote: > > On May 21, 2006, at 10:42 , Marc G. Fournier wrote: > >> -> Seq Scan on page_schedule ps2 (cost=0.00..2364.95 rows=33110 >> width=16) (actual time=0.021..623.363 rows=94798 loops=1) > > I don't know about rewriting the query, but it appears your statistics are a > little out of date (e.g., rows expected/actual 33110/94798). Does running > ANALYZE help? the data is idle, just loaded it on my desktop for testing purposes ... being paranoid, I have been doing a vacuum analyze on the table as I change the index's *just in case*, but, doing a full analyze on the whole database doesn't change the results any: Actually, the above results are odd anyway, since a second run of the exact same query, shows more normal numbers: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=3051.91..3054.19 rows=183 width=16) (actual time=1030.970..1031.257 rows=128 loops=1) -> Seq Scanon page_schedule ps2 (cost=0.00..2364.95 rows=91594 width=16) (actual time=0.019..636.599 rows=94798 loops=1) Filter: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time zone) Total runtime: 1031.681ms (4 rows) So not 100% certain where the 33110/94798 gap came from ;) ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > I'm trying to figure out some way to speed up the following query: > select ps2.page_id, ps2.template_component_id, max(ps2.start_time) > from page_schedule ps2 > where ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18' > group by ps2.page_id, ps2.template_component_id > Is there some other way I can either write above query *or* do an index, > such that it will use the index? One-sided inequalities frequently *shouldn't* use an index, because they're retrieving too much of the table. Are you sure this is fetching only a small fraction of the table? Are you using PG 8.1 (8.1 would be likely to try to use a bitmap indexscan for this)? You could experiment with enable_seqscan = off to see if the planner is actually wrong about its choice. If so, reducing random_page_cost might be the best permanent solution. regards, tom lane
On Sun, 21 May 2006, Tom Lane wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: >> I'm trying to figure out some way to speed up the following query: > >> select ps2.page_id, ps2.template_component_id, max(ps2.start_time) >> from page_schedule ps2 >> where ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18' >> group by ps2.page_id, ps2.template_component_id > >> Is there some other way I can either write above query *or* do an index, >> such that it will use the index? > > One-sided inequalities frequently *shouldn't* use an index, because > they're retrieving too much of the table. Are you sure this is fetching > only a small fraction of the table? Are you using PG 8.1 (8.1 would be > likely to try to use a bitmap indexscan for this)? > > You could experiment with enable_seqscan = off to see if the planner is > actually wrong about its choice. If so, reducing random_page_cost might > be the best permanent solution. vrnprd=# select version(); version ------------------------------------------------------------------------------------------------ PostgreSQL 8.1.3 on i386-portbld-freebsd6.1,compiled by GCC cc (GCC) 3.4.4 [FreeBSD] 20050518 (1 row) vrnprd=# set enable_seqscan = off; SET vrnprd=# explain analyze select ps2.page_id, ps2.template_component_id, max(ps2.start_time) from page_schedule ps2 where ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18' group by ps2.page_id, ps2.template_component_id ; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------- HashAggregate (cost=3509.96..3513.50 rows=283 width=16) (actual time=839.460..839.769 rows=128 loops=1) -> Bitmap HeapScan on page_schedule ps2 (cost=573.65..2822.86 rows=91614 width=16) (actual time=149.788..505.438 rows=94798 loops=1) Recheck Cond: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time zone) -> Bitmap Index Scan on start_time_page_schedule (cost=0.00..573.65 rows=91614 width=0) (actual time=127.761..127.761rows=94798 loops=1) Index Cond: (timezone('MST7MDT'::text, start_time) <= '2006-05-1708:09:18'::timestamp without time zone) Total runtime: 846.604 ms (6 rows) vrnprd=# And yup, it is definitely returning just 128 rows out of the 93k or so: 110 | 419 | 2005-10-26 13:15:00-03 130 | 215 | 2006-04-26 10:15:00-03 (128 rows) And, not sure how far to reduce random_page_cost, but it was 4 ... tried 2 and 1, and both come up with the same results ... with seqscan enabled, it does a seqscan :( I suspected with the <= there wasn't going to be much I could do with this, but figured I'd make sure there wasn't something that I was overlooking :( Thx ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664
"Marc G. Fournier" <scrappy@postgresql.org> writes: > -> Bitmap Heap Scan on page_schedule ps2 (cost=573.65..2822.86 rows=91614 width=16) (actual time=149.788..505.438rows=94798 loops=1) > Recheck Cond: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time zone) > -> Bitmap Index Scan on start_time_page_schedule (cost=0.00..573.65 rows=91614 width=0) (actual time=127.761..127.761rows=94798 loops=1) > Index Cond: (timezone('MST7MDT'::text, start_time) <= '2006-05-17 08:09:18'::timestamp without time zone) > And yup, it is definitely returning just 128 rows out of the 93k or so: No, the scan is pulling 94798 rows from the table, according to the EXPLAIN ANALYZE --- the number of resulting groups isn't much of a factor here. We don't currently have any index optimization for MIN/MAX in a GROUP BY context, and even if we did, it wouldn't apply here: the planner couldn't assume that the sort order of an index on "start_time at time zone 'MST7MDT'" would have anything to do with the ordering of just "start_time". Is there a reason you're writingwhere ps2.start_time at time zone 'MST7MDT' <= '2006-5-17 8:9:18' and notwhere ps2.start_time <= '2006-5-17 8:9:18' at time zone 'MST7MDT' The latter seems less likely to have strange behaviors near DST transitions. I don't think it'll be any faster at the moment, but you could at least save maintaining a specialized index. regards, tom lane