Re: adding 'limit' leads to very slow query - Mailing list pgsql-performance

From Michael McFarland
Subject Re: adding 'limit' leads to very slow query
Date
Msg-id opsndn2ujisvs6tg@localhost.localdomain
Whole thread Raw
In response to adding 'limit' leads to very slow query  ("Michael McFarland" <sidlonDoesntLikeSpam@yahoo.com>)
Responses Re: adding 'limit' leads to very slow query  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
List pgsql-performance
    I continue to be stumped by this.  You are right that I should have
listed the estimates provided by explain... basically for the select where
bar = 41, it's estimating there will be 40,000 rows instead of 7, out of
what's actuallly 5 million records in the table.

    So far I've tried increase statistics for the bar column from the
default 10 to 100 (vacuum analyzing after) and the explain-plan hasn't
changed.  I also notice that afterward, the pg_stats record for the bar
column still only lists the top 5 values of bar (out of 68 unique values
in the table).  Are there any other settings I could try to improve the
detail of the statistics?

    By the way, I think I do have a workaround for this particular query:
         select * from (select * from foo where barId = 412 order by id
desc) as tempview limit 25;
This query uses the bar index and completes instantly.  However, I feel
like I should find the heart of the problem, since bad statistics could
end up affecting other plans, right?

  - Mike


On Mon, 7 Mar 2005 23:03:43 -0800 (PST), Stephan Szabo
<sszabo@megazone.bigpanda.com> wrote:

> On Mon, 7 Mar 2005, Michael McFarland wrote:
>
>>    I'm trying to understand why a particular query is slow, and it seems
>> like the optimizer is choosing a strange plan.  See this summary:
>>
>>
>> * I have a large table, with an index on the primary key 'id' and on a
>> field 'foo'.
>> > select count(*) from foo;
>> 1,000,000
>> > select count(*) from foo where bar = 41;
>> 7
>>
>> * This query happens very quickly.
>> > explain select * from foo where barId = 412 order by id desc;
>> Sort ()
>>    Sort key= id
>>   ->   Index scan using bar_index on foo ()
>>      Index cond: barId = 412
>>
>> But this query takes forever
>>
>> > explain select * from foo where barId = 412 order by id desc limit 25;
>> Limit ()
>>   ->   Index scan backward using primarykey_index
>>    Filter:  barID = 412
>
> You didn't show the row estimates, but I'd guess that it's expecting
> either that ther are more rows that match barId=412 than there actually
> are (which may be solvable by raising the statistics target on the column
> and re-analyzing) such that going backwards on id in order to make 25
> matching rows isn't a bad plan or that barId and id are correlated which
> is unfortunately not going to be recognized right now.
>

--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

pgsql-performance by date:

Previous
From: "Alexander Kirpa"
Date:
Subject: Re: Postgres on RAID5
Next
From: Arshavir Grigorian
Date:
Subject: Postgres on RAID5