Home > mailing lists

Re: plan question - query with order by and limit not choosing index depends on size of limit, table - Mailing list pgsql-performance

From	Mike Broers
Subject	Re: plan question - query with order by and limit not choosing index depends on size of limit, table
Date	January 14, 2011 12:36:53
Msg-id	AANLkTikQ8p4V=g5heXGK1vg-7NCWAnkydYtybmqjraT+@mail.gmail.com Whole thread
In response to	plan question - query with order by and limit not choosing index depends on size of limit, table (Mike Broers <mbroers@gmail.com>)
Responses	Re: plan question - query with order by and limit not choosing index depends on size of limit, table
List	pgsql-performance

Tree view

Thanks Robert, this is what I was looking for. I will try these suggestions and follow up if any of them are the silver bullet.

On Fri, Jan 14, 2011 at 7:11 AM, Robert Haas wrote:

On Thu, Jan 6, 2011 at 4:36 PM, Mike Broers <mbroers@gmail.com> wrote:
> Thanks for the assistance.
> Here is an explain analyze of the query with the problem limit:
> production=# explain analyze select * from landing_page.messages where
> ((messages.topic = E'x') AND (messages.processed = 'f')) ORDER BY
> messages.created_at ASC limit 10;
>
> QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------
> ------------------------------------------------------------
> Limit (cost=0.00..2891.06 rows=10 width=1340) (actual
> time=207922.586..207922.586 rows=0 loops=1)
> -> Index Scan using idx_landing_page_messages_created_at on messages
> (cost=0.00..449560.48 rows=1555 widt
> h=1340) (actual time=207922.581..207922.581 rows=0 loops=1)
> Filter: ((NOT processed) AND ((topic)::text = 'x'::text))
> Total runtime: 207949.413 ms
> (4 rows)

You're not the first person to have been bitten by this. The
optimizer thinks that rows WHERE NOT processed and topic = 'x' are
reasonably common, so it figures that it can just index scan until it
finds 10 of them. But when it turns out that there are none at all,
it ends up having to scan the entire index, which stinks big-time.

The alternative plan is to use a different index to find ALL the
relevant rows, sort them, and then take the top 10. That would suck
if there actually were tons of rows like this, but there aren't.

So the root of the problem, in some sense, is that the planner's
estimate of the selectivity of "NOT processed and topic = 'x'" is not
very good. Some things to try:

- increase the statistics target for the "processed" and "topic"
columns even higher
- put the processed rows in one table and the not processed rows in
another table
- do something like SELECT * FROM (SELECT .. LIMIT 200 OFFSET 0) LIMIT
10 to try to fool the planner into planning based on the higher, inner
limit
- create a partial index on messages (topic) WHERE NOT processed and
see if the planner will use it

...Robert

pgsql-performance by date:

From: Vitalii Tymchyshyn
Date: 14 January 2011, 07:39:08
Subject: Re: queries with lots of UNIONed relations

From: Mladen Gogala
Date: 14 January 2011, 12:55:04
Subject: Re: The good, old times

Re: plan question - query with order by and limit not choosing index depends on size of limit, table - Mailing list pgsql-performance

Previous

Next