Thread: Index scan startup time

Index scan startup time

From
Peter Eisentraut
Date:
[Apologies if this already went through.  I don't see it in the archives.]

Normally one expects that an index scan would have a startup time of nearly
zero.  Can anyone explain this:

EXPLAIN ANALYZE select activity_id from activity where state in (10000, 10001)
order by activity_id limit 100;

QUERY PLAN

Limit  (cost=0.00..622.72 rows=100 width=8) (actual
time=207356.054..207356.876 rows=100 loops=1)
  ->  Index Scan using activity_pk on activity  (cost=0.00..40717259.91
rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100 loops=1)
        Filter: ((state = 10000) OR (state = 10001))
Total runtime: 207357.000 ms

The table has seen VACUUM FULL and REINDEX before this.

The plan choice and the statistics look right, but why does it take 3 minutes
before doing anything?  Or is the measurement of the actual start time
inaccurate?  This is quite reproducible, so it's not just a case of a
temporary I/O bottleneck, say.

(PostgreSQL 8.0.3)

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Index scan startup time

From
"Steinar H. Gunderson"
Date:
On Thu, Mar 30, 2006 at 01:59:10PM +0200, Peter Eisentraut wrote:
> EXPLAIN ANALYZE select activity_id from activity where state in (10000, 10001)
> order by activity_id limit 100;
>
> QUERY PLAN
>
> Limit  (cost=0.00..622.72 rows=100 width=8) (actual
> time=207356.054..207356.876 rows=100 loops=1)
>   ->  Index Scan using activity_pk on activity  (cost=0.00..40717259.91
> rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100 loops=1)
>         Filter: ((state = 10000) OR (state = 10001))
> Total runtime: 207357.000 ms
>
> The table has seen VACUUM FULL and REINDEX before this.

The index scan is by activity_id, not by state. Do you have an index on state
at all?

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Index scan startup time

From
Michael Stone
Date:
On Thu, Mar 30, 2006 at 01:59:10PM +0200, Peter Eisentraut wrote:
>The table has seen VACUUM FULL and REINDEX before this.

But no analyze?

Mike Stone

Re: Index scan startup time

From
Peter Eisentraut
Date:
Am Donnerstag, 30. März 2006 14:02 schrieb Steinar H. Gunderson:
> On Thu, Mar 30, 2006 at 01:59:10PM +0200, Peter Eisentraut wrote:
> > EXPLAIN ANALYZE select activity_id from activity where state in (10000,
> > 10001) order by activity_id limit 100;
> >
> > QUERY PLAN
> >
> > Limit  (cost=0.00..622.72 rows=100 width=8) (actual
> > time=207356.054..207356.876 rows=100 loops=1)
> >   ->  Index Scan using activity_pk on activity  (cost=0.00..40717259.91
> > rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100
> > loops=1) Filter: ((state = 10000) OR (state = 10001))
> > Total runtime: 207357.000 ms
> >
> > The table has seen VACUUM FULL and REINDEX before this.
>
> The index scan is by activity_id, not by state. Do you have an index on
> state at all?

There is an index on state as well but the column is not selective enough.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Index scan startup time

From
Peter Eisentraut
Date:
Am Donnerstag, 30. März 2006 14:06 schrieb Michael Stone:
> On Thu, Mar 30, 2006 at 01:59:10PM +0200, Peter Eisentraut wrote:
> >The table has seen VACUUM FULL and REINDEX before this.
>
> But no analyze?

ANALYZE as well, but the plan choice is not the point anyway.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Index scan startup time

From
"Steinar H. Gunderson"
Date:
On Thu, Mar 30, 2006 at 02:23:53PM +0200, Peter Eisentraut wrote:
>>> EXPLAIN ANALYZE select activity_id from activity where state in (10000,
>>> 10001) order by activity_id limit 100;
>>>
>>> QUERY PLAN
>>>
>>> Limit  (cost=0.00..622.72 rows=100 width=8) (actual
>>> time=207356.054..207356.876 rows=100 loops=1)
>>>   ->  Index Scan using activity_pk on activity  (cost=0.00..40717259.91
>>> rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100
>>> loops=1) Filter: ((state = 10000) OR (state = 10001))
>>> Total runtime: 207357.000 ms
>>>
>>> The table has seen VACUUM FULL and REINDEX before this.
>> The index scan is by activity_id, not by state. Do you have an index on
>> state at all?
> There is an index on state as well but the column is not selective enough.

Well, it's logical enough; it scans along activity_id until it finds one with
state=10000 or state=10001. You obviously have a _lot_ of records with low
activity_id and state none of these two, so Postgres needs to scan all those
records before it founds 100 it can output. This is the “startup cost” you're
seeing.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Index scan startup time

From
Markus Schaber
Date:
Hi, Peter,

Peter Eisentraut wrote:
>>>The table has seen VACUUM FULL and REINDEX before this.
>>But no analyze?
> ANALYZE as well, but the plan choice is not the point anyway.

Maybe you could add a combined Index on activity_id and state, or (if
you use this kind of query more often) a conditional index on
activity_id where state in (10000,10001).

Btw, PostgreSQL 8.1 could AND two bitmap index scans on the activity and
state indices, and get the result faster (i presume).

Markus
--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Index scan startup time

From
Michael Stone
Date:
On Thu, Mar 30, 2006 at 02:31:34PM +0200, Steinar H. Gunderson wrote:
>Well, it's logical enough; it scans along activity_id until it finds one with
>state=10000 or state=10001. You obviously have a _lot_ of records with low
>activity_id and state none of these two, so Postgres needs to scan all those
>records before it founds 100 it can output. This is the “startup cost” you're
>seeing.

Yes. And the estimates are bad enough (orders of magnitude) that I can't
help but wonder whether pg could come up with a better plan with better
statistics:

>>>>   ->  Index Scan using activity_pk on activity  (cost=0.00..40717259.91 rows=6538650 width=8) (actual
time=207356.050..207356.722rows=100 loops=1) Filter: ((state = 10000) OR (state = 10001)) 

Mike Stone

Re: Index scan startup time

From
"Steinar H. Gunderson"
Date:
On Thu, Mar 30, 2006 at 07:42:53AM -0500, Michael Stone wrote:
> Yes. And the estimates are bad enough (orders of magnitude) that I can't
> help but wonder whether pg could come up with a better plan with better
> statistics:
>
>>>>>  ->  Index Scan using activity_pk on activity  (cost=0.00..40717259.91
>>>>>  rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100
>>>>>  loops=1) Filter: ((state = 10000) OR (state = 10001))

Isn't the rows=100 here because of the LIMIT?

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Index scan startup time

From
Peter Eisentraut
Date:
Am Donnerstag, 30. März 2006 14:31 schrieb Steinar H. Gunderson:
> Well, it's logical enough; it scans along activity_id until it finds one
> with state=10000 or state=10001. You obviously have a _lot_ of records with
> low activity_id and state none of these two, so Postgres needs to scan all
> those records before it founds 100 it can output. This is the “startup
> cost” you're seeing.

The startup cost is the cost until the plan is set up to start outputting
rows.  It is not the time until the first row is found.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Index scan startup time

From
"Steinar H. Gunderson"
Date:
On Thu, Mar 30, 2006 at 02:59:02PM +0200, Peter Eisentraut wrote:
>> Well, it's logical enough; it scans along activity_id until it finds one
>> with state=10000 or state=10001. You obviously have a _lot_ of records with
>> low activity_id and state none of these two, so Postgres needs to scan all
>> those records before it founds 100 it can output. This is the “startup
>> cost” you're seeing.
> The startup cost is the cost until the plan is set up to start outputting
> rows.  It is not the time until the first row is found.

Well, point, my terminology was wrong. Still, what you're seeing is endless
scanning for the first row. I don't know your distribution, but are you
really sure state wouldn't have better selectivity?

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Index scan startup time

From
Michael Stone
Date:
On Thu, Mar 30, 2006 at 02:51:47PM +0200, Steinar H. Gunderson wrote:
>On Thu, Mar 30, 2006 at 07:42:53AM -0500, Michael Stone wrote:
>> Yes. And the estimates are bad enough (orders of magnitude) that I can't
>> help but wonder whether pg could come up with a better plan with better
>> statistics:
>>
>>>>>>  ->  Index Scan using activity_pk on activity  (cost=0.00..40717259.91
>>>>>>  rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100
>>>>>>  loops=1) Filter: ((state = 10000) OR (state = 10001))
>
>Isn't the rows=100 here because of the LIMIT?

Yes. I was looking at the other side; I thought pg could estimate how
much work it would have to do to hit the limit, but double-checking it
looks like it can't.

Mike Stone

Re: Index scan startup time

From
Tom Lane
Date:
Michael Stone <mstone+postgres@mathom.us> writes:
> Yes. I was looking at the other side; I thought pg could estimate how
> much work it would have to do to hit the limit, but double-checking it
> looks like it can't.

Yes, it does, you just have to understand how to interpret the EXPLAIN
output.  Peter had

Limit  (cost=0.00..622.72 rows=100 width=8) (actual time=207356.054..207356.876 rows=100 loops=1)
  ->  Index Scan using activity_pk on activity  (cost=0.00..40717259.91 rows=6538650 width=8) (actual
time=207356.050..207356.722rows=100 loops=1) 
        Filter: ((state = 10000) OR (state = 10001))
Total runtime: 207357.000 ms

Notice that the total cost of the LIMIT node is estimated as far less
than the total cost of the IndexScan node.  That's exactly because the
planner doesn't expect the indexscan to run to completion.

The problem here appears to be a non-random correlation between state
and activity, such that the desired state values are not randomly
scattered in the activity sequence.  The planner doesn't know about that
correlation and hence can't predict the poor startup time.

            regards, tom lane

Re: Index scan startup time

From
Peter Eisentraut
Date:
Tom Lane wrote:
> The problem here appears to be a non-random correlation between state
> and activity, such that the desired state values are not randomly
> scattered in the activity sequence.  The planner doesn't know about
> that correlation and hence can't predict the poor startup time.

So from when to when is the startup time (the "x" in "x..y") actually
measured?  When does the clock start ticking and when does it stop?
That is what's confusing me.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Index scan startup time

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> So from when to when is the startup time (the "x" in "x..y") actually
> measured?  When does the clock start ticking and when does it stop?
> That is what's confusing me.

The planner thinks of the startup time (the first estimated-cost number)
as the time before the output scan can start, eg, time to do the sort in
a sort node.  EXPLAIN ANALYZE however reports the actual time until the
first output row is delivered.  When you've got a filter applied to the
node result, as in this case, there can be a material difference between
the two definitions, because of the time spent scanning rows that don't
get past the filter.

            regards, tom lane

Re: Index scan startup time

From
Simon Riggs
Date:
On Thu, 2006-03-30 at 13:59 +0200, Peter Eisentraut wrote:

> Can anyone explain this:
>
> EXPLAIN ANALYZE select activity_id from activity where state in (10000, 10001)
> order by activity_id limit 100;
>
> QUERY PLAN
>
> Limit  (cost=0.00..622.72 rows=100 width=8) (actual
> time=207356.054..207356.876 rows=100 loops=1)
>   ->  Index Scan using activity_pk on activity  (cost=0.00..40717259.91
> rows=6538650 width=8) (actual time=207356.050..207356.722 rows=100 loops=1)
>         Filter: ((state = 10000) OR (state = 10001))
> Total runtime: 207357.000 ms
>

...just adding to Tom's comments:

The interesting thing about this query is it *looks* like the index is
being used to retrieve the matching rows and so the startup time looks
wrong. However the index is being used instead of a sort to satisfy the
ORDER BY, with the state clauses being applied as after-scan filters
since those columns aren't part of the index. So the Index Scan starts
at the leftmost page and scans the whole index...

If the query had chosen a sort, the startup time would have been easily
understandable, but there's no indication from the EXPLAIN as to why the
Index Scan exists.

Perhaps it should be a TODO item to make the EXPLAIN say explicitly when
an Index Scan is being used to provide sorted output?

Best Regards, Simon Riggs