Hi There,
I have hit a query plan issue that I believe is a bug or under-estimation, and would like to know if there it is known or if there is any workaround…
This event_log table has 4 million rows.
“log_id” is the primary key (bigint),
there is a composite index “event_data_search” over (event::text, insert_time::datetime).
Query A:
SELECT min(log_id) FROM event_log
WHERE event='S-Create' AND
insert_time>'2013-09-15' and insert_time<'2013-09-16'
Query B:
SELECT log_id FROM event_log
WHERE event='S-Create' AND
insert_time>'2013-09-15' and insert_time<'2013-09-16'
ORDER BY log_id
What I want to achieve is Query A – get the min log_id within a range. But it is very slow, taking 10 or 20 seconds.
If I don’t limit the output to LIMIT 1 – like Query B – then it is sub-second fast.
Explain of A – take 10~20 seconds to run
Limit (cost=0.00..132.54 rows=1 width=8)
-> Index Scan using event_log_pkey on event_log (cost=0.00..1503484.33 rows=11344 width=8)
Filter: ((insert_time > '2013-09-15 00:00:00'::timestamp without time zone) AND (insert_time < '2013-09-16 00:00:00'::timestamp without time zone) AND (event = 'S-Create'::text))
Explain of B – take a few milliseconds to run
Sort (cost=41015.85..41021.52 rows=11344 width=8)
Sort Key: log_id
-> Bitmap Heap Scan on event_log (cost=311.42..40863.05 rows=11344 width=8)
Recheck Cond: ((event = 'S-Create'::text) AND (insert_time > '2013-09-15 00:00:00'::timestamp without time zone) AND (insert_time < '2013-09-16 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on event_data_search (cost=0.00..310.86 rows=11344 width=0)
Index Cond: ((event = 'S-Create'::text) AND (insert_time > '2013-09-15 00:00:00'::timestamp without time zone) AND (insert_time < '2013-09-16 00:00:00'::timestamp without time zone))
Plan of A thought that the index scan node will get the first row right at 0.00, and hence the limit node will get all the rows needed within 132.54 (because event_log_pkey are sorted)
I would like to point out that – this 0.00 estimation omits the fact that it actually takes a much longer time for the index scan node to get the first row, because 3.99M rows it comes across won’t meet the condition filter at all.
Other background info:
The event_log table has been vacuumed and analyzed.
I have PostgreSQL 9.2.4 (x64) on Windows Server 2008 R2 with me. 8GB ram. 1*Xeon E5606.
Thanks,
Sam