I continue to be stumped by this. You are right that I should have
listed the estimates provided by explain... basically for the select where
bar = 41, it's estimating there will be 40,000 rows instead of 7, out of
what's actuallly 5 million records in the table.
So far I've tried increase statistics for the bar column from the
default 10 to 100 (vacuum analyzing after) and the explain-plan hasn't
changed. I also notice that afterward, the pg_stats record for the bar
column still only lists the top 5 values of bar (out of 68 unique values
in the table). Are there any other settings I could try to improve the
detail of the statistics?
By the way, I think I do have a workaround for this particular query:
select * from (select * from foo where barId = 412 order by id
desc) as tempview limit 25;
This query uses the bar index and completes instantly. However, I feel
like I should find the heart of the problem, since bad statistics could
end up affecting other plans, right?
- Mike
On Mon, 7 Mar 2005 23:03:43 -0800 (PST), Stephan Szabo
<sszabo@megazone.bigpanda.com> wrote:
> On Mon, 7 Mar 2005, Michael McFarland wrote:
>
>> I'm trying to understand why a particular query is slow, and it seems
>> like the optimizer is choosing a strange plan. See this summary:
>>
>>
>> * I have a large table, with an index on the primary key 'id' and on a
>> field 'foo'.
>> > select count(*) from foo;
>> 1,000,000
>> > select count(*) from foo where bar = 41;
>> 7
>>
>> * This query happens very quickly.
>> > explain select * from foo where barId = 412 order by id desc;
>> Sort ()
>> Sort key= id
>> -> Index scan using bar_index on foo ()
>> Index cond: barId = 412
>>
>> But this query takes forever
>>
>> > explain select * from foo where barId = 412 order by id desc limit 25;
>> Limit ()
>> -> Index scan backward using primarykey_index
>> Filter: barID = 412
>
> You didn't show the row estimates, but I'd guess that it's expecting
> either that ther are more rows that match barId=412 than there actually
> are (which may be solvable by raising the statistics target on the column
> and re-analyzing) such that going backwards on id in order to make 25
> matching rows isn't a bad plan or that barId and id are correlated which
> is unfortunately not going to be recognized right now.
>
--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/