Re: Indexing a Boolean or Null column? - Mailing list pgsql-performance

From Tom Lane
Subject Re: Indexing a Boolean or Null column?
Date
Msg-id 19149.1073202498@sss.pgh.pa.us
Whole thread Raw
In response to Re: Indexing a Boolean or Null column?  (Christopher Kings-Lynne <chriskl@familyhealth.com.au>)
List pgsql-performance
Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
>> Ok, so ...evenly distributed data on small set of values forces
>> sequential scan since that's faster.  I expected that based on
>> what I've read so far.

> Actually, it's more a case of that fetching an item via and index is
> considered, say, four times slower than fetching something off a
> sequential scan (sort of).  Hence, if you are selecting more than 25% of
> the table, then a sequential scan will be faster, even though it has to
> process more rows.

Actually it's worse than that: if an indexscan is going to fetch more
than a few percent of the table, the planner will think it slower than
a sequential scan --- and usually it'll be right.  The four-to-one ratio
refers to the cost of fetching a whole page (8K) randomly versus
sequentially.  In a seqscan, you can examine all the rows on a page
(dozens to hundreds usually) for the price of one page fetch.  In an
indexscan, one page fetch might bring in just one row that you care
about.  So the breakeven point is a lot worse than 4:1.

There is constant debate about the values of these parameters; in
particular the 4:1 page fetch cost ratio breaks down if you are able
to cache a significant fraction of the table in RAM.  See the list
archives for details.  But it's certainly true that an indexscan has to
be a lot more selective than 25% before it's going to be a win over
a seqscan.  I'd say 1% to 5% is the right ballpark.

            regards, tom lane

pgsql-performance by date:

Previous
From: Christopher Kings-Lynne
Date:
Subject: Re: Indexing a Boolean or Null column?
Next
From: Christopher Browne
Date:
Subject: Re: Indexing a Boolean or Null column?