Re: index scan on =, but not < ? - Mailing list pgsql-performance

From John Arbash Meinel
Subject Re: index scan on =, but not < ?
Date
Msg-id 422DF680.1000409@arbash-meinel.com
Whole thread Raw
In response to index scan on =, but not < ?  ("Rick Schumeyer" <rschumeyer@ieee.org>)
List pgsql-performance
Rick Schumeyer wrote:

> I have two index questions. The first is about an issue that has been
> recently discussed,
>
> and I just wanted to be sure of my understanding. Functions like
> count(), max(), etc. will
>
> use sequential scans instead of index scans because the index doesn’t
> know which rows
>
> are actually visible…is this correct?
>
Actually, index scans are chosen whenever the cost is expected to be
cheaper than a sequential scan. This is generally about < 10% of the
total number of rows.

> Second:
>
> I created an index in a table with over 10 million rows.
>
> The index is on field x, which is a double.
>
> The following command, as I expected, results in an index scan:
>
> =# explain select * from data where x = 0;
>
> QUERY PLAN
>
> -------------------------------------------------------------------------
>
> Index Scan using data_x_ix on data (cost=0.00..78.25 rows=19 width=34)
>
> Index Cond: (x = 0::double precision)
>
> (2 rows)
>
Since you have 10m rows, when it expects to get only 19 rows, it is much
faster to use an index.

> But this command, in which the only difference if > instead of =, is a
> sequential scan.
>
> =# explain select * from data where x > 0;
>
> QUERY PLAN
>
> ------------------------------------------------------------------
>
> Seq Scan on data (cost=0.00..1722605.20 rows=62350411 width=34)
>
> Filter: (x > 0::double precision)
>
> (2 rows)
>
Here, pg expects to find 62M rows (you must have significantly more than
10M rows). In this case a sequential scan is much faster than an indexed
one, so that's what pg does.

> Why is this?
>
> (This is with pg 8.0.1 on a PC running FC3 with 1GB ram…if it matters)
>
If you think there is truly a performance problem, try attaching the
results of "explain analyze" in which we might be able to tell you that
your statistics inaccurate (run vacuum analyze if you haven't).

John
=:->


Attachment

pgsql-performance by date:

Previous
From: Thomas F.O'Connell
Date:
Subject: Re: index scan on =, but not < ?
Next
From: "Rick Schumeyer"
Date:
Subject: Re: index scan on =, but not < ?