Thread: Cost of indexscan

Cost of indexscan

From
Kari Lavikka
Date:
Hi,

Postgres seems to estimate the cost of indexscan to be a bit too high.
The table has something like 500000 rows and I have run reindex and vacuum
analyze recently. Is there something to tune?

Index is a multicolumn index:
    "admin_event_stamp_event_type_name_status" btree (stamp, event_type_name, status)

Singlecolumn index for stamp doesn't make a significant difference in cost
estimation.

               -- -- -- -- -- -- -- -- -- --

galleria=> set enable_seqscan = true;
SET
galleria=> explain analyze SELECT * FROM admin_event WHERE stamp > (current_timestamp - '1 days'::INTERVAL)::TIMESTAMP
WITHOUTTIME ZONE; 
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Seq Scan on admin_event  (cost=0.00..19844.37 rows=154361 width=109) (actual time=479.173..2760.186 rows=4705 loops=1)
   Filter: (stamp > ((('now'::text)::timestamp(6) with time zone - '1 day'::interval))::timestamp without time zone)
 Total runtime: 2765.428 ms
(3 rows)

galleria=> set enable_seqscan = false;
SET
galleria=> explain analyze SELECT * FROM admin_event WHERE stamp > (current_timestamp - '1 days'::INTERVAL)::TIMESTAMP
WITHOUTTIME ZONE; 

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using admin_event_stamp_event_type_name_status on admin_event  (cost=0.00..540690.18 rows=154361 width=109)
(actualtime=7.771..124.886 rows=4706 loops=1) 
   Index Cond: (stamp > ((('now'::text)::timestamp(6) with time zone - '1 day'::interval))::timestamp without time
zone)
 Total runtime: 82.530 ms
(3 rows)

               -- -- -- -- -- -- -- -- -- --

Distribution of stamp looks like the following:

galleria=> SELECT date_trunc('month', stamp)::DATE, count(*), repeat('*', (count(*) / 3000)::INTEGER) FROM admin_event
GROUPBY date_trunc('month', stamp)::DATE ORDER BY 1; 
 date_trunc | count  |                  repeat
------------+--------+-------------------------------------------
 2002-01-01 |   2013 |
 2002-02-01 |   2225 |
 2002-03-01 |   2165 |
 2002-04-01 |   2692 |
 2002-05-01 |   3031 | *
 2002-06-01 |   2376 |
 2002-07-01 |   2694 |
 2002-08-01 |   4241 | *
 2002-09-01 |   4140 | *
 2002-10-01 |   4964 | *
 2002-11-01 |   8769 | **
 2002-12-01 |  13249 | ****
 2003-01-01 |   5776 | *
 2003-02-01 |   6301 | **
 2003-03-01 |   6404 | **
 2003-04-01 |   6905 | **
 2003-05-01 |   7119 | **
 2003-06-01 |   8978 | **
 2003-07-01 |   7723 | **
 2003-08-01 |  36566 | ************
 2003-09-01 |  15759 | *****
 2003-10-01 |  10610 | ***
 2003-11-01 |  83113 | ***************************
 2003-12-01 |  90927 | ******************************
 2004-01-01 | 124479 | *****************************************


    |\__/|
    ( oo )    Kari Lavikka - tuner@bdb.fi - (050) 380 3808
__ooO(  )Ooo_______ _____ ___ _ _  _   _    _      _                  _
      ""

Re: Cost of indexscan

From
Tom Lane
Date:
Kari Lavikka <tuner@bdb.fi> writes:
> Postgres seems to estimate the cost of indexscan to be a bit too high.
> The table has something like 500000 rows and I have run reindex and vacuum
> analyze recently. Is there something to tune?

I think the real problem here is that the row estimate is off by a
factor of thirty:

>  Seq Scan on admin_event  (cost=0.00..19844.37 rows=154361 width=109) (actual time=479.173..2760.186 rows=4705
loops=1)

With a correct estimate the indexscan would have been chosen.

> galleria=> explain analyze SELECT * FROM admin_event WHERE stamp > (current_timestamp - '1
days'::INTERVAL)::TIMESTAMPWITHOUT TIME ZONE; 

It's not possible for the planner to make a good guess here since it
doesn't know what the comparison value for the stamp column is.
(current_timestamp isn't a constant and so the comparison expression
can't be reduced to a constant at plan time.)

The traditional solution for this is to cheat:

create function ago(interval) returns timestamp without time zone as
'select localtimestamp - $1' language sql strict immutable;

select * from admin_event where stamp > ago('1 days');

This works because the function is mislabeled as immutable, encouraging
the planner to fold the result to a constant on sight.  It also has the
pleasant property of making your query more readable.  The downside is
that you are in fact lying to the system about the behavior of the ago()
function, and so you can get in trouble.  This only really works for
queries executed interactively --- you can't use this method inside
plpgsql functions, for instance.

> Distribution of stamp looks like the following:

Hm, you might also find that increasing the statistics target for stamp
would be a good idea, since its distribution is so skewed.  But unless
you do something like the above, the statistics won't get used anyway...

            regards, tom lane