Thread: Index scan startup time
[Apologies if this already went through. I don't see it in the archives.] Normally one expects that an index scan would have a startup time of nearly zero. Can anyone explain this: EXPLAIN ANALYZE select activity_id from activity where state in (10000, 10001) order by activity_id limit 100; QUERY PLAN Limit (cost=0.00..622.72 rows=100 width=8) (actual time=207356.054..207356.876 rows=100 loops=1) -> Index Scan using activity_pk on activity (cost=0.00..40717259.91 rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100 loops=1) Filter: ((state = 10000) OR (state = 10001)) Total runtime: 207357.000 ms The table has seen VACUUM FULL and REINDEX before this. The plan choice and the statistics look right, but why does it take 3 minutes before doing anything? Or is the measurement of the actual start time inaccurate? This is quite reproducible, so it's not just a case of a temporary I/O bottleneck, say. (PostgreSQL 8.0.3) -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, Mar 30, 2006 at 01:59:10PM +0200, Peter Eisentraut wrote: > EXPLAIN ANALYZE select activity_id from activity where state in (10000, 10001) > order by activity_id limit 100; > > QUERY PLAN > > Limit (cost=0.00..622.72 rows=100 width=8) (actual > time=207356.054..207356.876 rows=100 loops=1) > -> Index Scan using activity_pk on activity (cost=0.00..40717259.91 > rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100 loops=1) > Filter: ((state = 10000) OR (state = 10001)) > Total runtime: 207357.000 ms > > The table has seen VACUUM FULL and REINDEX before this. The index scan is by activity_id, not by state. Do you have an index on state at all? /* Steinar */ -- Homepage: http://www.sesse.net/
On Thu, Mar 30, 2006 at 01:59:10PM +0200, Peter Eisentraut wrote: >The table has seen VACUUM FULL and REINDEX before this. But no analyze? Mike Stone
Am Donnerstag, 30. März 2006 14:02 schrieb Steinar H. Gunderson: > On Thu, Mar 30, 2006 at 01:59:10PM +0200, Peter Eisentraut wrote: > > EXPLAIN ANALYZE select activity_id from activity where state in (10000, > > 10001) order by activity_id limit 100; > > > > QUERY PLAN > > > > Limit (cost=0.00..622.72 rows=100 width=8) (actual > > time=207356.054..207356.876 rows=100 loops=1) > > -> Index Scan using activity_pk on activity (cost=0.00..40717259.91 > > rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100 > > loops=1) Filter: ((state = 10000) OR (state = 10001)) > > Total runtime: 207357.000 ms > > > > The table has seen VACUUM FULL and REINDEX before this. > > The index scan is by activity_id, not by state. Do you have an index on > state at all? There is an index on state as well but the column is not selective enough. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Am Donnerstag, 30. März 2006 14:06 schrieb Michael Stone: > On Thu, Mar 30, 2006 at 01:59:10PM +0200, Peter Eisentraut wrote: > >The table has seen VACUUM FULL and REINDEX before this. > > But no analyze? ANALYZE as well, but the plan choice is not the point anyway. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, Mar 30, 2006 at 02:23:53PM +0200, Peter Eisentraut wrote: >>> EXPLAIN ANALYZE select activity_id from activity where state in (10000, >>> 10001) order by activity_id limit 100; >>> >>> QUERY PLAN >>> >>> Limit (cost=0.00..622.72 rows=100 width=8) (actual >>> time=207356.054..207356.876 rows=100 loops=1) >>> -> Index Scan using activity_pk on activity (cost=0.00..40717259.91 >>> rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100 >>> loops=1) Filter: ((state = 10000) OR (state = 10001)) >>> Total runtime: 207357.000 ms >>> >>> The table has seen VACUUM FULL and REINDEX before this. >> The index scan is by activity_id, not by state. Do you have an index on >> state at all? > There is an index on state as well but the column is not selective enough. Well, it's logical enough; it scans along activity_id until it finds one with state=10000 or state=10001. You obviously have a _lot_ of records with low activity_id and state none of these two, so Postgres needs to scan all those records before it founds 100 it can output. This is the “startup cost” you're seeing. /* Steinar */ -- Homepage: http://www.sesse.net/
Hi, Peter, Peter Eisentraut wrote: >>>The table has seen VACUUM FULL and REINDEX before this. >>But no analyze? > ANALYZE as well, but the plan choice is not the point anyway. Maybe you could add a combined Index on activity_id and state, or (if you use this kind of query more often) a conditional index on activity_id where state in (10000,10001). Btw, PostgreSQL 8.1 could AND two bitmap index scans on the activity and state indices, and get the result faster (i presume). Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org
On Thu, Mar 30, 2006 at 02:31:34PM +0200, Steinar H. Gunderson wrote: >Well, it's logical enough; it scans along activity_id until it finds one with >state=10000 or state=10001. You obviously have a _lot_ of records with low >activity_id and state none of these two, so Postgres needs to scan all those >records before it founds 100 it can output. This is the “startup cost” you're >seeing. Yes. And the estimates are bad enough (orders of magnitude) that I can't help but wonder whether pg could come up with a better plan with better statistics: >>>> -> Index Scan using activity_pk on activity (cost=0.00..40717259.91 rows=6538650 width=8) (actual time=207356.050..207356.722rows=100 loops=1) Filter: ((state = 10000) OR (state = 10001)) Mike Stone
On Thu, Mar 30, 2006 at 07:42:53AM -0500, Michael Stone wrote: > Yes. And the estimates are bad enough (orders of magnitude) that I can't > help but wonder whether pg could come up with a better plan with better > statistics: > >>>>> -> Index Scan using activity_pk on activity (cost=0.00..40717259.91 >>>>> rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100 >>>>> loops=1) Filter: ((state = 10000) OR (state = 10001)) Isn't the rows=100 here because of the LIMIT? /* Steinar */ -- Homepage: http://www.sesse.net/
Am Donnerstag, 30. März 2006 14:31 schrieb Steinar H. Gunderson: > Well, it's logical enough; it scans along activity_id until it finds one > with state=10000 or state=10001. You obviously have a _lot_ of records with > low activity_id and state none of these two, so Postgres needs to scan all > those records before it founds 100 it can output. This is the “startup > cost” you're seeing. The startup cost is the cost until the plan is set up to start outputting rows. It is not the time until the first row is found. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Thu, Mar 30, 2006 at 02:59:02PM +0200, Peter Eisentraut wrote: >> Well, it's logical enough; it scans along activity_id until it finds one >> with state=10000 or state=10001. You obviously have a _lot_ of records with >> low activity_id and state none of these two, so Postgres needs to scan all >> those records before it founds 100 it can output. This is the “startup >> cost” you're seeing. > The startup cost is the cost until the plan is set up to start outputting > rows. It is not the time until the first row is found. Well, point, my terminology was wrong. Still, what you're seeing is endless scanning for the first row. I don't know your distribution, but are you really sure state wouldn't have better selectivity? /* Steinar */ -- Homepage: http://www.sesse.net/
On Thu, Mar 30, 2006 at 02:51:47PM +0200, Steinar H. Gunderson wrote: >On Thu, Mar 30, 2006 at 07:42:53AM -0500, Michael Stone wrote: >> Yes. And the estimates are bad enough (orders of magnitude) that I can't >> help but wonder whether pg could come up with a better plan with better >> statistics: >> >>>>>> -> Index Scan using activity_pk on activity (cost=0.00..40717259.91 >>>>>> rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100 >>>>>> loops=1) Filter: ((state = 10000) OR (state = 10001)) > >Isn't the rows=100 here because of the LIMIT? Yes. I was looking at the other side; I thought pg could estimate how much work it would have to do to hit the limit, but double-checking it looks like it can't. Mike Stone
Michael Stone <mstone+postgres@mathom.us> writes: > Yes. I was looking at the other side; I thought pg could estimate how > much work it would have to do to hit the limit, but double-checking it > looks like it can't. Yes, it does, you just have to understand how to interpret the EXPLAIN output. Peter had Limit (cost=0.00..622.72 rows=100 width=8) (actual time=207356.054..207356.876 rows=100 loops=1) -> Index Scan using activity_pk on activity (cost=0.00..40717259.91 rows=6538650 width=8) (actual time=207356.050..207356.722rows=100 loops=1) Filter: ((state = 10000) OR (state = 10001)) Total runtime: 207357.000 ms Notice that the total cost of the LIMIT node is estimated as far less than the total cost of the IndexScan node. That's exactly because the planner doesn't expect the indexscan to run to completion. The problem here appears to be a non-random correlation between state and activity, such that the desired state values are not randomly scattered in the activity sequence. The planner doesn't know about that correlation and hence can't predict the poor startup time. regards, tom lane
Tom Lane wrote: > The problem here appears to be a non-random correlation between state > and activity, such that the desired state values are not randomly > scattered in the activity sequence. The planner doesn't know about > that correlation and hence can't predict the poor startup time. So from when to when is the startup time (the "x" in "x..y") actually measured? When does the clock start ticking and when does it stop? That is what's confusing me. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Peter Eisentraut <peter_e@gmx.net> writes: > So from when to when is the startup time (the "x" in "x..y") actually > measured? When does the clock start ticking and when does it stop? > That is what's confusing me. The planner thinks of the startup time (the first estimated-cost number) as the time before the output scan can start, eg, time to do the sort in a sort node. EXPLAIN ANALYZE however reports the actual time until the first output row is delivered. When you've got a filter applied to the node result, as in this case, there can be a material difference between the two definitions, because of the time spent scanning rows that don't get past the filter. regards, tom lane
On Thu, 2006-03-30 at 13:59 +0200, Peter Eisentraut wrote: > Can anyone explain this: > > EXPLAIN ANALYZE select activity_id from activity where state in (10000, 10001) > order by activity_id limit 100; > > QUERY PLAN > > Limit (cost=0.00..622.72 rows=100 width=8) (actual > time=207356.054..207356.876 rows=100 loops=1) > -> Index Scan using activity_pk on activity (cost=0.00..40717259.91 > rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100 loops=1) > Filter: ((state = 10000) OR (state = 10001)) > Total runtime: 207357.000 ms > ...just adding to Tom's comments: The interesting thing about this query is it *looks* like the index is being used to retrieve the matching rows and so the startup time looks wrong. However the index is being used instead of a sort to satisfy the ORDER BY, with the state clauses being applied as after-scan filters since those columns aren't part of the index. So the Index Scan starts at the leftmost page and scans the whole index... If the query had chosen a sort, the startup time would have been easily understandable, but there's no indication from the EXPLAIN as to why the Index Scan exists. Perhaps it should be a TODO item to make the EXPLAIN say explicitly when an Index Scan is being used to provide sorted output? Best Regards, Simon Riggs