Re: why is the LIMIT clause slowing down this SELECT? - Mailing list pgsql-general

From Jeff Davis
Subject Re: why is the LIMIT clause slowing down this SELECT?
Date
Msg-id 1186006259.27620.138.camel@dogma.ljc.laika.com
Whole thread Raw
In response to why is the LIMIT clause slowing down this SELECT?  ("Mason Hale" <masonhale@gmail.com>)
Responses Re: why is the LIMIT clause slowing down this SELECT?  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: why is the LIMIT clause slowing down this SELECT?  ("Mason Hale" <masonhale@gmail.com>)
List pgsql-general
On Wed, 2007-08-01 at 15:56 -0500, Mason Hale wrote:
> On a 8.1.9 version database that has been recently vacuumed and
> analyzed, I'm seeing some dramatic performance degradation if a limit
> clause is included in the query. This seems counter-intuitive to me.
>
> Here's the query and explain plan WITH the LIMIT clause:
>
> SELECT *
>    FROM topic_feed
>  WHERE topic_id = 106947234
>   ORDER BY score DESC
>      LIMIT 25
>
> Limit  (cost=0.00..651.69 rows=25 width=29) (actual
> time=72644.652..72655.029 rows=25 loops=1)
>   ->  Index Scan Backward using topic_feed_score_index on topic_feed
> (cost=0.00..21219.08 rows=814 width=29) (actual
> time=72644.644..72654.855 rows=25 loops=1)
>         Filter: (topic_id = 106947234)
> Total runtime: 72655.733 ms
>
> ==============
>
> and now WITHOUT the LIMIT clause:
>
> SELECT *
>    FROM topic_feed
>  WHERE topic_id = 106947234
>   ORDER BY score DESC
>
> Sort  (cost=1683.75..1685.78 rows=814 width=29) (actual
> time=900.553..902.267 rows=492 loops=1)
>   Sort Key: score
>   ->  Bitmap Heap Scan on topic_feed  (cost=7.85..1644.40 rows=814
> width=29) (actual time=307.900..897.993 rows=492 loops=1)
>         Recheck Cond: (topic_id = 106947234)
>         ->  Bitmap Index Scan on
> index_topic_feed_on_topic_id_and_feed_id  (cost=0.00..7.85 rows=814
> width=0) (actual time=213.205..213.205 rows=2460 loops=1)
>               Index Cond: (topic_id = 106947234)
> Total runtime: 904.049 ms

Let's call those plan 1 and plan 2.

In plan 1, the planner thinks that it will find 25 tuples matching that
topic_id quickly during the backwards index scan on
topic_feed_score_index. Instead, it looks like it has to go through a
lot of tuples before it finds the necessary 25.

In plan 2, it does it backward: first it finds the tuples matching the
topic_id, then it sorts those and returns the first 25.

If I'm reading the plans correctly (plan 2, in particular), there are
2460 tuples matching the topic_id, but only 492 are visible. I don't
know why there are so many invisible tuples that still exist in the
index after your VACUUM.

Are there a lot of UPDATEs/DELETEs on tuples with that topic_id between
when you run the VACUUM and when you run the SELECT? Do you have big
transactions open when running the VACUUM?

Also, how many tuples are in the table overall?

The stats look fairly accurate: it's only off on the estimate of
matching rows by about a factor of two (which isn't terribly bad), 814
estimated versus 492 that are actually visible and match the topic_id.

If I had to guess, I'd say you either:

(a) have too many invisible tuples - you can correct this with more
frequent VACUUMing
(b) the distribution of tuples with matching topic_id is not even, and
many of the tuples with a matching topic_id happen to have a low score -
this is harder to fix, because the stats collector can't detect it.
You'd probably have to coerce it to use plan 2.

Regards,
    Jeff Davis




pgsql-general by date:

Previous
From: John K Masters
Date:
Subject: Re: Linux distro
Next
From: Reid Thompson
Date:
Subject: Re: Linux distro