Thread: Query not using index

Query not using index

From
ryan@paymentalliance.net
Date:
Here's the query:

SELECT
cart_row_id
FROM
pa_shopping_cart
WHERE
order_id = 20;

Here's the schema:

       Attribute       |           Type           |
Modifier

-----------------------+--------------------------+---------------------------------------------------------------------
 cart_row_id           | integer                  | not null default
 product_id            | integer                  | not null
 color                 | character varying(100)   |
 size                  | character varying(100)   |
 style                 | character varying(100)   |
 order_id              | integer                  | not null
 time_added            | timestamp with time zone |
 voided_date           | timestamp with time zone |
 voided_by             | integer                  |
 expired_date          | timestamp with time zone |
 item_state            | character varying(50)    | default
'in_basket'
 received_back_date    | timestamp with time zone |
 price_charged         | numeric(30,6)            |
 price_refunded        | numeric(30,6)            |
 shipping_charged      | numeric(30,6)            |
 shipping_refunded     | numeric(30,6)            |
 price_tax_charged     | numeric(30,6)            |
 price_tax_refunded    | numeric(30,6)            |
 shipping_tax_charged  | numeric(30,6)            |
 shipping_tax_refunded | numeric(30,6)            |
 price_name            | character varying(30)    |
 refund_id             | integer                  |
 cs_comments           | text                     |
 price                 | numeric(30,6)            |
 ship_group_id         | integer                  |
 ship_package_id       | integer                  |
 delivery_date         | date                     |
 sentiment             | text                     |
 vendor_id             | integer                  |
 linkshare_sent        | bit(1)                   |
 mapped_to_address     | character(1)             |
 product_name          | character varying(200)   |
Indices: delivery_date_pa_shopping_cart_,
         pa_cart_by_item_state,
         pa_cart_by_order,
         pa_cart_by_product,
         pa_cart_row_order,
         pa_item_map_to_addr,
         pa_shop_cart_prod_ord_idx,
         pa_shopping_cart_pkey,
         ship_package_id_pa_shopping_car,
         vendor_id_pa_shopping_cart_key

There is an index on:
just order_id
just order_id and cart_row_id
and a PK on cart row_id

I don't understand why it's not using one of these indexes!
Please post your responses to the group - my email is down.

Thanks for any help!

-r

Re: Query not using index

From
Stephan Szabo
Date:
Have you vacuum analyzed recently and what does
explain show for the query?

On Thu, 10 May 2001 ryan@paymentalliance.net wrote:

> Here's the query:
>
> SELECT
> cart_row_id
> FROM
> pa_shopping_cart
> WHERE
> order_id = 20;
> [ ... ]
> There is an index on:
> just order_id
> just order_id and cart_row_id
> and a PK on cart row_id
>
> I don't understand why it's not using one of these indexes!
> Please post your responses to the group - my email is down.


Re: Query not using index

From
ryan@paymentalliance.net
Date:
I vacuum every half hour!  Here is the output from EXPLAIN:

NOTICE:  QUERY PLAN:

Seq Scan on pa_shopping_cart  (cost=0.00..7237.94 rows=9420 width=296)

EXPLAIN

Thanks!


On Thu, 10 May 2001 18:19:16 +0000 (UTC),
sszabo@megazone23.bigpanda.com (Stephan Szabo) wrote:

>
>Have you vacuum analyzed recently and what does
>explain show for the query?
>
>On Thu, 10 May 2001 ryan@paymentalliance.net wrote:
>
>> Here's the query:
>>
>> SELECT
>> cart_row_id
>> FROM
>> pa_shopping_cart
>> WHERE
>> order_id = 20;
>> [ ... ]
>> There is an index on:
>> just order_id
>> just order_id and cart_row_id
>> and a PK on cart row_id
>>
>> I don't understand why it's not using one of these indexes!
>> Please post your responses to the group - my email is down.
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: you can get off all lists at once with the unregister command
>    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


Re: Query not using index

From
"Mitch Vincent"
Date:
Does that query really return 9420 rows ? If so, a sequential scan is
probably better/faster than an index scan..

-Mitch

----- Original Message -----
From: <ryan@paymentalliance.net>
To: <pgsql-general@postgresql.org>
Sent: Thursday, May 10, 2001 9:22 AM
Subject: Re: Query not using index


> I vacuum every half hour!  Here is the output from EXPLAIN:
>
> NOTICE:  QUERY PLAN:
>
> Seq Scan on pa_shopping_cart  (cost=0.00..7237.94 rows=9420 width=296)
>
> EXPLAIN
>
> Thanks!
>
>
> On Thu, 10 May 2001 18:19:16 +0000 (UTC),
> sszabo@megazone23.bigpanda.com (Stephan Szabo) wrote:
>
> >
> >Have you vacuum analyzed recently and what does
> >explain show for the query?
> >
> >On Thu, 10 May 2001 ryan@paymentalliance.net wrote:
> >
> >> Here's the query:
> >>
> >> SELECT
> >> cart_row_id
> >> FROM
> >> pa_shopping_cart
> >> WHERE
> >> order_id = 20;
> >> [ ... ]
> >> There is an index on:
> >> just order_id
> >> just order_id and cart_row_id
> >> and a PK on cart row_id
> >>
> >> I don't understand why it's not using one of these indexes!
> >> Please post your responses to the group - my email is down.
> >
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 2: you can get off all lists at once with the unregister command
> >    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>


Re: Re: Query not using index

From
Patrick Welche
Date:
On Thu, May 10, 2001 at 01:22:56PM +0000, ryan@paymentalliance.net wrote:
> I vacuum every half hour!  Here is the output from EXPLAIN:
>
> NOTICE:  QUERY PLAN:
>
> Seq Scan on pa_shopping_cart  (cost=0.00..7237.94 rows=9420 width=296)
>
> EXPLAIN
>
> Thanks!

Then try

set enable_seqscan to off;
explain select ...

and see what the cost is. Compare it with the value above.

Cheers,

Patrick

Re: Query not using index

From
ryan@paymentalliance.net
Date:
No the query usually returns between 0 and 5 rows.  Usually not zero -
most often 1.

-r

On Thu, 10 May 2001 19:47:32 +0000 (UTC), mitch@venux.net ("Mitch
Vincent") wrote:

>Does that query really return 9420 rows ? If so, a sequential scan is
>probably better/faster than an index scan..
>
>-Mitch
>
>----- Original Message -----
>From: <ryan@paymentalliance.net>
>To: <pgsql-general@postgresql.org>
>Sent: Thursday, May 10, 2001 9:22 AM
>Subject: Re: Query not using index
>
>
>> I vacuum every half hour!  Here is the output from EXPLAIN:
>>
>> NOTICE:  QUERY PLAN:
>>
>> Seq Scan on pa_shopping_cart  (cost=0.00..7237.94 rows=9420 width=296)
>>
>> EXPLAIN
>>
>> Thanks!
>>
>>
>> On Thu, 10 May 2001 18:19:16 +0000 (UTC),
>> sszabo@megazone23.bigpanda.com (Stephan Szabo) wrote:
>>
>> >
>> >Have you vacuum analyzed recently and what does
>> >explain show for the query?
>> >
>> >On Thu, 10 May 2001 ryan@paymentalliance.net wrote:
>> >
>> >> Here's the query:
>> >>
>> >> SELECT
>> >> cart_row_id
>> >> FROM
>> >> pa_shopping_cart
>> >> WHERE
>> >> order_id = 20;
>> >> [ ... ]
>> >> There is an index on:
>> >> just order_id
>> >> just order_id and cart_row_id
>> >> and a PK on cart row_id
>> >>
>> >> I don't understand why it's not using one of these indexes!
>> >> Please post your responses to the group - my email is down.
>> >
>> >
>> >---------------------------(end of broadcast)---------------------------
>> >TIP 2: you can get off all lists at once with the unregister command
>> >    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: Have you checked our extensive FAQ?
>>
>> http://www.postgresql.org/users-lounge/docs/faq.html
>>
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org


Re: Re: Query not using index

From
Tom Lane
Date:
ryan@paymentalliance.net writes:
> No the query usually returns between 0 and 5 rows.  Usually not zero -
> most often 1.

Ah.  You must have a few values that are far more frequent (like tens of
thousands of occurrences?) and these are throwing off the planner's
statistics.

7.2 will probably do better with this sort of data distribution, but for
now it's a difficult problem.

            regards, tom lane

Re: Re: Query not using index

From
Chris Jones
Date:
On Thu, May 10, 2001 at 05:22:07PM -0400, Tom Lane wrote:

> ryan@paymentalliance.net writes:
> > No the query usually returns between 0 and 5 rows.  Usually not zero -
> > most often 1.
>
> Ah.  You must have a few values that are far more frequent (like tens of
> thousands of occurrences?) and these are throwing off the planner's
> statistics.

I had a similar situation, where I had a lot of rows with 0's in
them.  Changing those to NULLs worked wonders.  The planner (or
statistics gatherer, or something) apparently takes notice of the
distribution of non-NULL values.

Chris

--
chris@mt.sri.com -----------------------------------------------------
Chris Jones                                    SRI International, Inc.
                                                           www.sri.com

Attachment

Re: Re: Query not using index

From
Tom Lane
Date:
Chris Jones <chris@mt.sri.com> writes:
>> Ah.  You must have a few values that are far more frequent (like tens of
>> thousands of occurrences?) and these are throwing off the planner's
>> statistics.

> I had a similar situation, where I had a lot of rows with 0's in
> them.  Changing those to NULLs worked wonders.

Yes, if you have a lot of "dummy" values it's a good idea to represent
them as NULLs rather than some arbitrarily-chosen regular data value.
The planner does keep track of NULLs separately from everything else.

            regards, tom lane

Re: Re: Query not using index

From
"Richard Huxton"
Date:
From: "Tom Lane" <tgl@sss.pgh.pa.us>

> > I had a similar situation, where I had a lot of rows with 0's in
> > them.  Changing those to NULLs worked wonders.
>
> Yes, if you have a lot of "dummy" values it's a good idea to represent
> them as NULLs rather than some arbitrarily-chosen regular data value.
> The planner does keep track of NULLs separately from everything else.

Is there a good reason why rdbms don't just keep a cache of decisions on
this stuff. I realise SQL is supposed to be ad-hoc but in reality, it's the
old 90:10 rule where a handful of queries get run consistently and where
performance is important.

Why doesn't PG (or any other system afaik) just have a first guess, run the
query and then if the costs are horribly wrong cache the right result. I'm
guessing there's a bloody good reason (TM) for it since query planning has
got to be equivalent to least-cost path so NP (NP-Complete? I forget - too
long out of college).

- Richard Huxton


Re: Re: Query not using index

From
Tom Lane
Date:
"Richard Huxton" <dev@archonet.com> writes:
> Why doesn't PG (or any other system afaik) just have a first guess, run the
> query and then if the costs are horribly wrong cache the right result.

?? Knowing that your previous guess was wrong doesn't tell you what the
right answer is, especially not for the somewhat-different question that
the next query is likely to provide.

The real problem here is simply that PG hasn't been keeping adequately
detailed statistics.  I'm currently working on improving that for 7.2...
see discussions over in pghackers if you are interested.

            regards, tom lane

Re: Re: Query not using index

From
Bruce Momjian
Date:
> Is there a good reason why rdbms don't just keep a cache of decisions on
> this stuff. I realise SQL is supposed to be ad-hoc but in reality, it's the
> old 90:10 rule where a handful of queries get run consistently and where
> performance is important.
>
> Why doesn't PG (or any other system afaik) just have a first guess, run the
> query and then if the costs are horribly wrong cache the right result. I'm
> guessing there's a bloody good reason (TM) for it since query planning has
> got to be equivalent to least-cost path so NP (NP-Complete? I forget - too
> long out of college).

I have asked about this before.  Decisions about sequential/index scans
could be theoretically fed from the executor back to the optimizer for
later user.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Query not using index

From
ryan@paymentalliance.net
Date:
You and Stephan hit it right on the nose - our table has been
maliciously propagated with thousands of faulty values - once gone
index are in use and DB is SPEEDING along  8)

Thanks for your help!!!

-r


On Thu, 10 May 2001 21:49:28 +0000 (UTC), in
comp.databases.postgresql.general you wrote:

>--w2JjAQZceEVGylhD
>Content-Type: text/plain; charset=us-ascii
>Content-Disposition: inline
>Content-Transfer-Encoding: quoted-printable
>
>On Thu, May 10, 2001 at 05:22:07PM -0400, Tom Lane wrote:
>
>> ryan@paymentalliance.net writes:
>> > No the query usually returns between 0 and 5 rows.  Usually not zero -
>> > most often 1.
>>=20
>> Ah.  You must have a few values that are far more frequent (like tens of
>> thousands of occurrences?) and these are throwing off the planner's
>> statistics.
>
>I had a similar situation, where I had a lot of rows with 0's in
>them.  Changing those to NULLs worked wonders.  The planner (or
>statistics gatherer, or something) apparently takes notice of the
>distribution of non-NULL values.
>
>Chris
>
>--=20
>chris@mt.sri.com -----------------------------------------------------
>Chris Jones                                    SRI International, Inc.
>                                                           www.sri.com
>
>--w2JjAQZceEVGylhD
>Content-Type: application/pgp-signature
>Content-Disposition: inline
>
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.0.4 (NetBSD)
>Comment: For info see http://www.gnupg.org
>
>iEYEARECAAYFAjr7CVoACgkQ4nX8TnrnU2+p0ACaAoSNEtwIlibMlh+H9ehJecmy
>lBcAnjI0TYJubbSIwgzi8DuRxoos4OwT
>=edfw
>-----END PGP SIGNATURE-----
>
>--w2JjAQZceEVGylhD
>Content-Type: text/plain
>Content-Disposition: inline
>Content-Transfer-Encoding: binary
>MIME-Version: 1.0
>
>
>---------------------------(end of broadcast)---------------------------
>TIP 4: Don't 'kill -9' the postmaster
>
>--w2JjAQZceEVGylhD--


Re: Re: Query not using index

From
"Richard Huxton"
Date:
From: "Tom Lane" <tgl@sss.pgh.pa.us>

> "Richard Huxton" <dev@archonet.com> writes:
> > Why doesn't PG (or any other system afaik) just have a first guess, run
the
> > query and then if the costs are horribly wrong cache the right result.
>
> ?? Knowing that your previous guess was wrong doesn't tell you what the
> right answer is, especially not for the somewhat-different question that
> the next query is likely to provide.

Surely if you used a seqscan on "where x=1" and only got 2 rows rather than
the 3000 you were expecting the only alternative is to try an index?

> The real problem here is simply that PG hasn't been keeping adequately
> detailed statistics.  I'm currently working on improving that for 7.2...
> see discussions over in pghackers if you are interested.

Thinking about it (along with Bruce's reply posted to the list) I guess the
difference is whether you gather the statistics up-front during a vacuum, or
build them as queries are used. You're always going to need *something* to
base your first guess on anyway - the "learning" would only help you in
those cases where the distribution of values wasn't a normal curve.

Anyway, given that I'm up to my neck in work at the moment and I don't
actually know what I'm talking about, I'll shut up and get back to keeping
clients happy :-)

- Richard Huxton


Re: Re: Query not using index

From
Tom Lane
Date:
"Richard Huxton" <dev@archonet.com> writes:
>> ?? Knowing that your previous guess was wrong doesn't tell you what the
>> right answer is, especially not for the somewhat-different question that
>> the next query is likely to provide.

> Surely if you used a seqscan on "where x=1" and only got 2 rows rather than
> the 3000 you were expecting the only alternative is to try an index?

But if the next query is "where x=2", what do you do?  Keep in mind that
the data distributions people have been having trouble with are
irregular: you can't conclude anything very reliable about x=2 based on
what you know about x=1.

> Thinking about it (along with Bruce's reply posted to the list) I guess the
> difference is whether you gather the statistics up-front during a vacuum, or
> build them as queries are used.

Stats gathered as a byproduct of individual queries might be useful if
you happen to get the exact same queries over again, but I doubt that
a succession of such results should be expected to build up a picture
that's complete enough to extrapolate to other queries.  Stats gathered
by ANALYZE have the merit that they come from a process that's designed
specifically to give you a good statistical picture.

            regards, tom lane