Thread: Index Being Ignored?

Index Being Ignored?

From

Joe Lester

Date:

30 June 2006, 14:05:01

I have a index question. My table has 800K rows and I a doing a basic query on an indexed integer field which takes over 2 seconds to complete because it's ignoring the index for some reason. Any ideas as to why it's ignoring the index? I'm using postgres 8.0.2.

SELECT count(*) FROM purchase_order_items WHERE expected_quantity > '0'

EXPLAIN ANALYZE reveals that it's not using the index...

Aggregate (cost=22695.28..22695.28 rows=1 width=0) (actual time=2205.688..2205.724 rows=1 loops=1)

-> Seq Scan on purchase_order_items (cost=0.00..21978.08 rows=286882 width=0) (actual time=0.535..2184.405 rows=7458 loops=1)

Filter: (expected_quantity > 0)

Total runtime: 2207.203 ms

However, if I use the "SET ENABLE_SEQSCAN TO OFF" trick, then it does use the index and is much faster.

SET ENABLE_SEQSCAN TO OFF;

EXPLAIN ANALYZE SELECT count(*) FROM purchase_order_items WHERE expected_quantity > '0'

Aggregate (cost=1050659.46..1050659.46 rows=1 width=0) (actual time=137.393..137.441 rows=1 loops=1)

-> Index Scan using purchase_order_items_expected_quantity_idx on purchase_order_items (cost=0.00..1049942.25 rows=286882 width=0) (actual time=0.756..119.990 rows=7458 loops=1)

Index Cond: (expected_quantity > 0)

Total runtime: 139.185 ms

I could understand if this was a really complex query and the planner got confused... but this is such a simple query. Is it OK to use "SET ENABLE_SEQSCAN TO OFF;" in production code? Is there another solution?

Thanks!

------------------------------

-- Table Definition --

CREATE TABLE purchase_order_items (

id serial NOT NULL,

purchase_order_id integer,

manufacturer_id integer,

quantity integer,

product_name character varying(16),

short_description character varying(60),

expected_quantity integer,

received_quantity integer,

"position" real,

created_at timestamp without time zone DEFAULT now(),

updated_at timestamp without time zone

);

-- Index --

CREATE INDEX purchase_order_items_expected_quantity_idx ON purchase_order_items USING btree (expected_quantity);

Re: Index Being Ignored?

From

Markus Schaber

Date:

30 June 2006, 14:29:09

Hi, Joe,

Joe Lester wrote:
> Aggregate  (cost=22695.28..22695.28 rows=1 width=0) (actual
> time=2205.688..2205.724 rows=1 loops=1)
>   ->  Seq Scan on purchase_order_items  (cost=0.00..21978.08 rows=286882
> width=0) (actual time=0.535..2184.405 rows=7458 loops=1)
>         Filter: (expected_quantity > 0)

The query planner estimates that your filter will hit 286882 rows, while
in reality it hits only 7458 rows. That's why the query planer chooses a
sequential scan.

It seems that the statistics for the column expected_quantity are off.

My suggestions:

- make shure that the statistics are current by analyzing the table
appropriately (e. G. by using the autovacuum daemon from contrib).

- increase the statistics target for this column.

- if you run this query very often, an conditional index might make sense:

CREATE INDEX purchase_order_having_quantity_idx ON purchase_order_items
(expected_quantity) WHERE expected_quantity > 0;


HTH,
Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Index Being Ignored?

From

Tom Lane

Date:

30 June 2006, 14:41:55

Joe Lester <joe_lester@sweetwater.com> writes:
> SELECT count(*) FROM purchase_order_items WHERE expected_quantity > '0'

> Aggregate  (cost=22695.28..22695.28 rows=1 width=0) (actual
> time=2205.688..2205.724 rows=1 loops=1)
>    ->  Seq Scan on purchase_order_items  (cost=0.00..21978.08
> rows=286882 width=0) (actual time=0.535..2184.405 rows=7458 loops=1)
>          Filter: (expected_quantity > 0)
> Total runtime: 2207.203 ms

Why is the expected row count so far off --- have you analyzed the table
lately?  For such a simple WHERE condition the estimate should be pretty
accurate, if the stats are sufficient.  If this table is very large you
might need to increase the statistics targets, but more likely you just
haven't got up-to-date stats at all.

The planner *never* "ignores" an index.  It may deliberately decide not
to use it, if it thinks the seqscan plan will be faster, as it does in
this case --- note the much higher cost estimate for the indexscan:

> SET ENABLE_SEQSCAN TO OFF;
> EXPLAIN ANALYZE SELECT count(*) FROM purchase_order_items WHERE
> expected_quantity > '0'

> Aggregate  (cost=1050659.46..1050659.46 rows=1 width=0) (actual
> time=137.393..137.441 rows=1 loops=1)
>    ->  Index Scan using purchase_order_items_expected_quantity_idx on
> purchase_order_items  (cost=0.00..1049942.25 rows=286882 width=0)
> (actual time=0.756..119.990 rows=7458 loops=1)
>          Index Cond: (expected_quantity > 0)
> Total runtime: 139.185 ms

The reason the cost estimate is out of line with reality is mainly that
the rows estimate is out of line with reality.  There may be some index
order correlation it's not aware of too.

BTW you might want to think about updating to PG 8.1.  Its "bitmap"
index scans are much better suited for queries that are using a
relatively unselective index condition.

            regards, tom lane

Re: Index Being Ignored?

From

Joe Lester

Date:

30 June 2006, 16:31:12

great!

Thanks Markus and Tom!

On Jun 30, 2006, at 10:29 AM, Markus Schaber wrote:

> Hi, Joe,
>
> Joe Lester wrote:
>> Aggregate  (cost=22695.28..22695.28 rows=1 width=0) (actual
>> time=2205.688..2205.724 rows=1 loops=1)
>>   ->  Seq Scan on purchase_order_items  (cost=0.00..21978.08
>> rows=286882
>> width=0) (actual time=0.535..2184.405 rows=7458 loops=1)
>>         Filter: (expected_quantity > 0)
>
> The query planner estimates that your filter will hit 286882 rows,
> while
> in reality it hits only 7458 rows. That's why the query planer
> chooses a
> sequential scan.
>
> It seems that the statistics for the column expected_quantity are off.
>
> My suggestions:
>
> - make shure that the statistics are current by analyzing the table
> appropriately (e. G. by using the autovacuum daemon from contrib).
>
> - increase the statistics target for this column.
>
> - if you run this query very often, an conditional index might make
> sense:
>
> CREATE INDEX purchase_order_having_quantity_idx ON
> purchase_order_items
> (expected_quantity) WHERE expected_quantity > 0;
>
>
> HTH,
> Markus
>
> --
> Markus Schaber | Logical Tracking&Tracing International AG
> Dipl. Inf.     | Software Development GIS
>
> Fight against software patents in EU! www.ffii.org
> www.nosoftwarepatents.org
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>
>

Re: Index Being Ignored?

From

Alvaro Herrera

Date:

30 June 2006, 18:21:35

Joe Lester wrote:
> I have a index question. My table has 800K rows and I a doing a basic
> query on an indexed integer field which takes over 2 seconds to
> complete because it's ignoring the index for some reason. Any ideas
> as to why it's ignoring the index? I'm using postgres 8.0.2.
>
> SELECT count(*) FROM purchase_order_items WHERE expected_quantity > '0'
>
> EXPLAIN ANALYZE reveals that it's not using the index...
>
> Aggregate  (cost=22695.28..22695.28 rows=1 width=0) (actual
> time=2205.688..2205.724 rows=1 loops=1)
>   ->  Seq Scan on purchase_order_items  (cost=0.00..21978.08
> rows=286882 width=0) (actual time=0.535..2184.405 rows=7458 loops=1)
>         Filter: (expected_quantity > 0)
> Total runtime: 2207.203 ms

The estimated rowcount is far off.  When did you last run ANALYZE on
this table?

BTW, you should upgrade (to 8.0.8) unless you want known bugs to destroy
your data.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support