Re: Seqscan rather than Index

From: Tom Lane
Subject: Re: Seqscan rather than Index
Date: ,
Msg-id: 21110.1103305481@sss.pgh.pa.us
(view: Whole thread, Raw)
In response to: Re: Seqscan rather than Index  (Greg Stark)
Responses: Re: Seqscan rather than Index  (Greg Stark)
List: pgsql-performance

Tree view

Seqscan rather than Index  (Jon Anderson, )
 Re: Seqscan rather than Index  (Tom Lane, )
 Re: Seqscan rather than Index  (David Brown, )
  Re: Seqscan rather than Index  (Richard Huxton, )
   Re: Seqscan rather than Index  (Greg Stark, )
    Re: Seqscan rather than Index  (Tom Lane, )
     Re: Seqscan rather than Index  (Greg Stark, )
      Re: Seqscan rather than Index  (Tom Lane, )
    Re: Seqscan rather than Index  ("Steinar H. Gunderson", )
     Re: Seqscan rather than Index  ("Steinar H. Gunderson", )
      Re: Seqscan rather than Index  (Frank Wiles, )
       Re: Seqscan rather than Index  ("Steinar H. Gunderson", )
       Re: Seqscan rather than Index  (Tom Lane, )
        Re: Seqscan rather than Index  (Frank Wiles, )
     Re: Seqscan rather than Index  (Bruno Wolff III, )
      Re: Seqscan rather than Index  ("Steinar H. Gunderson", )

Greg Stark <> writes:
> Postgres is also more pessimistic about the efficiency of index scans. It's
> willing to use a sequential scan down to well below 5% selectivity when other
> databases use the more traditional rule of thumb of 10%.

However, other databases are probably basing their analysis on a
different execution model.  Since we have to visit both heap and index
in all cases, we do indeed have a larger penalty for index use.

I've looked pretty closely at the cost model for index access, believe me.
It's not pessimistic; if anything it is undercharging for index access.
(For one thing it treats the index's internal fetches as sequential
access, when in reality they are probably random.)

I think the one effect that's not being modeled is amortization of index
fetches across successive queries.  The cost model is pretty much based
on the assumption that each query starts from ground zero, whereas in
reality a heavily used index will certainly have all its upper levels in
RAM, and if it's not too large the leaf pages might all be cached too.
I wouldn't want to switch the planner over to making that assumption
exclusively, but we could talk about having a cost parameter that dials
the assumption up or down.

Awhile back I tried rewriting btcostestimate to charge zero for
accessing the metapage and the upper index levels, but charge
random_page_cost for fetching leaf pages.  For small indexes this came
out with worse (larger) numbers than we have now, which is not the
direction we want to go in :-(.  So I think that we have to somehow
honestly model caching of index pages across queries.

Of course, to be completely fair such a modification should account for
caching of heap pages as well, so it would also bring down the estimates
for seqscans.  But I'd be willing to accept a model that considers only
caching of index pages as a zero-order approximation.

            regards, tom lane


pgsql-performance by date:

From: Frank Wiles
Date:
Subject: Re: Seqscan rather than Index
From: "Steinar H. Gunderson"
Date:
Subject: Re: Seqscan rather than Index