Thread: Planner doesn't chose Index - (slow select)

Planner doesn't chose Index - (slow select)

From
"patrick keshishian"
Date:
Hi all,

I've been struggling with some performance issues with certain
SQL queries.  I was prepping a long-ish overview of my problem
to submit, but I think I'll start out with a simple case of the
problem first, hopefully answers I receive will help me solve
my initial issue.

Consider the following two queries which yield drastically different
run-time:

db=# select count(*) from pk_c2 b0 where b0.offer_id=7141;
 count
-------
     1
(1 row)
Time: 5139.004 ms

db=# select count(*) from pk_c2 b0 where b0.pending=true and b0.offer_id=7141;
 count
-------
     1
(1 row)
Time: 1.828 ms


That's 2811 times faster!

Just to give you an idea of size of pk_c2 table:

db=# select count(*) from pk_c2 ;
  count
---------
 2158094
(1 row)
Time: 5275.782 ms

db=# select count(*) from pk_c2 where pending=true;
 count
-------
    51
(1 row)
Time: 5073.699 ms



db=# explain select count(*) from pk_c2 b0 where b0.offer_id=7141;
QUERY PLAN
---------------------------------------------------------------------------
 Aggregate  (cost=44992.78..44992.78 rows=1 width=0)
   ->  Seq Scan on pk_c2 b0  (cost=0.00..44962.50 rows=12109 width=0)
         Filter: (offer_id = 7141)
(3 rows)
Time: 1.350 ms

db=# explain select count(*) from pk_c2 b0 where b0.pending=true and
b0.offer_id=7141;
QUERY PLAN
----------------------------------------------------------------------------------------
 Aggregate  (cost=45973.10..45973.10 rows=1 width=0)
   ->  Index Scan using pk_boidx on pk_c2 b0  (cost=0.00..45973.09
rows=1 width=0)
         Index Cond: (offer_id = 7141)
         Filter: (pending = true)
(4 rows)
Time: 1.784 ms



The table has indexes for both 'offer_id' and '(pending=true)':

Indexes:
    "pk_boidx" btree (offer_id)
    "pk_bpidx" btree (((pending = true)))

So, why would the planner chose to use the index on the second query
and not on the first?


Note that I am able to fool the planner into using an Index scan
on offer_id by adding a silly new condition in the where clause of
the first form of the query:


db=# explain select count(*) from pk_c2 b0 where b0.offer_id=7141 and oid > 1;
QUERY PLAN
-------------------------------------------------------------------------------------------
 Aggregate  (cost=45983.19..45983.19 rows=1 width=0)
   ->  Index Scan using pk_boidx on pk_c2 b0  (cost=0.00..45973.09
rows=4037 width=0)
         Index Cond: (offer_id = 7141)
         Filter: (oid > 1::oid)
(4 rows)
Time: 27.301 ms

db=# select count(*) from pk_c2 b0 where b0.offer_id=7141 and oid > 1;
 count
-------
     1
(1 row)
Time: 1.900 ms

What gives?

This seems just too hokey for my taste.

--patrick



db=# select version();
                                 version
-------------------------------------------------------------------------
 PostgreSQL 7.4.12 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.6

Re: Planner doesn't chose Index - (slow select)

From
Tom Lane
Date:
"patrick keshishian" <pkeshish@gmail.com> writes:
> I've been struggling with some performance issues with certain
> SQL queries.  I was prepping a long-ish overview of my problem
> to submit, but I think I'll start out with a simple case of the
> problem first, hopefully answers I receive will help me solve
> my initial issue.

Have you ANALYZEd this table lately?

> db=# select count(*) from pk_c2 b0 where b0.offer_id=7141;
>  count
> -------
>      1
> (1 row)

The planner is evidently estimating that there are 12109 such rows,
not 1, which is the reason for its reluctance to use an indexscan.
Generally the only reason for it to be off that far on such a simple
statistical issue is if you haven't updated the stats in a long time.
(If you've got a really skewed data distribution for offer_id, you
might need to raise the statistics target for it.)

> The table has indexes for both 'offer_id' and '(pending=true)':

> Indexes:
>     "pk_boidx" btree (offer_id)
>     "pk_bpidx" btree (((pending = true)))

The expression index on (pending = true) won't do you any good,
unless you spell your query in a weird way like
    ... WHERE (pending = true) = true
I'd suggest a plain index on "pending" instead.

> db=# select version();
>  PostgreSQL 7.4.12 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.6

You might want to think about an update, too.  7.4 is pretty long in the
tooth.

            regards, tom lane

Re: Planner doesn't chose Index - (slow select)

From
"patrick keshishian"
Date:
Tom,

You are absolutely correct about not having run ANALYZE
on the particular table.

In my attempt to create a simple "test case" I created that
table (pk_c2) from the original and had not run ANALYZE
on it, even though, ANALYZE had been run prior to building
that table.

The problem on the test table and the simple select count(*)
is no longer there (after ANALYZE).

The original issue, however, is still there. I'm stumped as
how to formulate my question without having to write a
lengthy essay.

As to upgrading from 7.4, I hear you, but I'm trying to support
a deployed product.

Thanks again for your input,
--patrick




On 4/18/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "patrick keshishian" <pkeshish@gmail.com> writes:
> > I've been struggling with some performance issues with certain
> > SQL queries.  I was prepping a long-ish overview of my problem
> > to submit, but I think I'll start out with a simple case of the
> > problem first, hopefully answers I receive will help me solve
> > my initial issue.
>
> Have you ANALYZEd this table lately?
>
> > db=# select count(*) from pk_c2 b0 where b0.offer_id=7141;
> >  count
> > -------
> >      1
> > (1 row)
>
> The planner is evidently estimating that there are 12109 such rows,
> not 1, which is the reason for its reluctance to use an indexscan.
> Generally the only reason for it to be off that far on such a simple
> statistical issue is if you haven't updated the stats in a long time.
> (If you've got a really skewed data distribution for offer_id, you
> might need to raise the statistics target for it.)
>
> > The table has indexes for both 'offer_id' and '(pending=true)':
>
> > Indexes:
> >     "pk_boidx" btree (offer_id)
> >     "pk_bpidx" btree (((pending = true)))
>
> The expression index on (pending = true) won't do you any good,
> unless you spell your query in a weird way like
>         ... WHERE (pending = true) = true
> I'd suggest a plain index on "pending" instead.
>
> > db=# select version();
> >  PostgreSQL 7.4.12 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.6
>
> You might want to think about an update, too.  7.4 is pretty long in the
> tooth.
>
>                         regards, tom lane