Rick Schumeyer wrote:
> I have two index questions. The first is about an issue that has been
> recently discussed,
>
> and I just wanted to be sure of my understanding. Functions like
> count(), max(), etc. will
>
> use sequential scans instead of index scans because the index doesn’t
> know which rows
>
> are actually visible…is this correct?
>
Actually, index scans are chosen whenever the cost is expected to be
cheaper than a sequential scan. This is generally about < 10% of the
total number of rows.
> Second:
>
> I created an index in a table with over 10 million rows.
>
> The index is on field x, which is a double.
>
> The following command, as I expected, results in an index scan:
>
> =# explain select * from data where x = 0;
>
> QUERY PLAN
>
> -------------------------------------------------------------------------
>
> Index Scan using data_x_ix on data (cost=0.00..78.25 rows=19 width=34)
>
> Index Cond: (x = 0::double precision)
>
> (2 rows)
>
Since you have 10m rows, when it expects to get only 19 rows, it is much
faster to use an index.
> But this command, in which the only difference if > instead of =, is a
> sequential scan.
>
> =# explain select * from data where x > 0;
>
> QUERY PLAN
>
> ------------------------------------------------------------------
>
> Seq Scan on data (cost=0.00..1722605.20 rows=62350411 width=34)
>
> Filter: (x > 0::double precision)
>
> (2 rows)
>
Here, pg expects to find 62M rows (you must have significantly more than
10M rows). In this case a sequential scan is much faster than an indexed
one, so that's what pg does.
> Why is this?
>
> (This is with pg 8.0.1 on a PC running FC3 with 1GB ram…if it matters)
>
If you think there is truly a performance problem, try attaching the
results of "explain analyze" in which we might be able to tell you that
your statistics inaccurate (run vacuum analyze if you haven't).
John
=:->