Re: plan question - query with order by and limit not choosing index depends on size of limit, table - Mailing list pgsql-performance

From Mike Broers
Subject Re: plan question - query with order by and limit not choosing index depends on size of limit, table
Date
Msg-id AANLkTikQ8p4V=g5heXGK1vg-7NCWAnkydYtybmqjraT+@mail.gmail.com
Whole thread Raw
In response to plan question - query with order by and limit not choosing index depends on size of limit, table  (Mike Broers <mbroers@gmail.com>)
Responses Re: plan question - query with order by and limit not choosing index depends on size of limit, table
List pgsql-performance
Thanks Robert, this is what I was looking for.  I will try these suggestions and follow up if any of them are the silver bullet.

On Fri, Jan 14, 2011 at 7:11 AM, Robert Haas wrote:
On Thu, Jan 6, 2011 at 4:36 PM, Mike Broers <mbroers@gmail.com> wrote:
> Thanks for the assistance.
> Here is an explain analyze of the query with the problem limit:
> production=# explain analyze select * from landing_page.messages where
> ((messages.topic = E'x') AND (messages.processed = 'f'))  ORDER BY
> messages.created_at ASC limit 10;
>
>    QUERY PLAN
>
> --------------------------------------------------------------------------------------------------------------
> ------------------------------------------------------------
>  Limit  (cost=0.00..2891.06 rows=10 width=1340) (actual
> time=207922.586..207922.586 rows=0 loops=1)
>    ->  Index Scan using idx_landing_page_messages_created_at on messages
>  (cost=0.00..449560.48 rows=1555 widt
> h=1340) (actual time=207922.581..207922.581 rows=0 loops=1)
>          Filter: ((NOT processed) AND ((topic)::text = 'x'::text))
>  Total runtime: 207949.413 ms
> (4 rows)

You're not the first person to have been bitten by this.  The
optimizer thinks that rows WHERE NOT processed and topic = 'x' are
reasonably common, so it figures that it can just index scan until it
finds 10 of them.  But when it turns out that there are none at all,
it ends up having to scan the entire index, which stinks big-time.

The alternative plan is to use a different index to find ALL the
relevant rows, sort them, and then take the top 10.   That would suck
if there actually were tons of rows like this, but there aren't.

So the root of the problem, in some sense, is that the planner's
estimate of the selectivity of "NOT processed and topic = 'x'" is not
very good.  Some things to try:

- increase the statistics target for the "processed" and "topic"
columns even higher
- put the processed rows in one table and the not processed rows in
another table
- do something like SELECT * FROM (SELECT .. LIMIT 200 OFFSET 0) LIMIT
10 to try to fool the planner into planning based on the higher, inner
limit
- create a partial index on messages (topic) WHERE NOT processed and
see if the planner will use it

...Robert

pgsql-performance by date:

Previous
From: Vitalii Tymchyshyn
Date:
Subject: Re: queries with lots of UNIONed relations
Next
From: Mladen Gogala
Date:
Subject: Re: The good, old times