Re: Cost of sort/order by not estimated by the query planner - Mailing list pgsql-performance

From Laurent Laborde
Subject Re: Cost of sort/order by not estimated by the query planner
Date
Msg-id 8a1bfe660912020501v38013544q8a2cb9048613d6e4@mail.gmail.com
Whole thread Raw
In response to Re: Cost of sort/order by not estimated by the query planner  (Greg Stark <gsstark@mit.edu>)
Responses Re: Cost of sort/order by not estimated by the query planner  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
On Wed, Dec 2, 2009 at 1:47 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Wed, Dec 2, 2009 at 11:13 AM, Laurent Laborde <kerdezixe@gmail.com> wrote:
>>>                                           QUERY PLAN
>>> -------------------------------------------------------------------------------------------------
>>>  Limit  (cost=0.00..2042.87 rows=5 width=1114)
>>>   ->  Index Scan using _article_pkey on _article
>>> (cost=0.00..7066684.46 rows=17296 width=1114)
>>>         Filter: (bitfield && B'1'::bit varying)
>>
>
> Ah, I missed this the first time around. It's scanning _article_pkey
> here. Ie, it's scanning the table from the oldest to the newest
> article assuming that the values wihch satisfy that constraint are
> evenly distributed and it'll find five of them pretty quickly. In
> reality there's a correlation between this bit being set and the value
> of _article.id and all the ones with it set are towards the end.
> Postgres doesn't have any statistics on how multiple columns are
> related yet so it can't know this.
>
> If this is an important query you might try having an index on
> <bitfield,id> or a partial index on "id where bitfield && B'1' ". The
> latter sounds like what you really need

There is, indeed, a lot of tricks and hacks.
Maybe my question was too confusing.

The question is : why a limit 5 is much much slower than a limit 500 ?

The problem is in the "order by" and not "finding enough the data that
match the filter".
Even if it's not evenly distributed, the queries without "order by"
are much much faster, EVEN when using the "pkey query plan".

without "order by" using the bitmap -> fast
without "order by" using the pkey index -> fast
with "order by" using the bitmap -> fast
with "order by" using the pkey index -> slowwwwwwwwwwwww

--
Laurent "ker2x" Laborde
Sysadmin & DBA at http://www.over-blog.com/

pgsql-performance by date:

Previous
From: Greg Stark
Date:
Subject: Re: Cost of sort/order by not estimated by the query planner
Next
From: Robert Haas
Date:
Subject: Re: Cost of sort/order by not estimated by the query planner