Re: Bad query performance with more conditions? - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Bad query performance with more conditions?
Date
Msg-id 71317720-55C7-4BB4-AA74-90EEB91086FF@solfertje.student.utwente.nl
Whole thread Raw
In response to Fwd: Bad query performance with more conditions?  ("kaifeng.zhu" <cafeeee@gmail.com>)
List pgsql-general
On 24 Jun 2010, at 11:13, kaifeng.zhu wrote:

> <resend to mailing list>
>
> On Thu, Jun 24, 2010 at 16:57, Thom Brown <thombrown@gmail.com> wrote:
>> Sounds like the planner took a wrong turn in the 2nd case.  Which
>> version of PostgreSQL are you running?
>
> PostgreSQL version 8.1.21 (With schemas)
>
>
> The explain commands show that:

For a next time, explain analyse would have been more useful.

>
> db1=> explain select * from emails where email_msg_id =
> '4dba381b-f55e-02d3-4b4a-95e2e98178e2' limit 1;
>                                                      QUERY PLAN
> -------------------------------------------------------------------------------------------------------------------
> Limit  (cost=0.00..4.00 rows=1 width=184)
>   ->  Index Scan using idx_emails_email_msg_id on emails
> (cost=0.00..370016.42 rows=92413 width=184)

I think your problem is here. I assume that msg-id's are fairly unique between messages, but the planner expects there
areover 90,000 records matching this ID. Hence the planner thinks this index has a low selectivity. 

>         Index Cond: (email_msg_id =
> '4dba381b-f55e-02d3-4b4a-95e2e98178e2'::bpchar)
> (3 rows)
>
> db1=> explain select * from emails where email_msg_id =
> '4dba381b-f55e-02d3-4b4a-95e2e98178e2' and email_sender =
> 'sender@domain.com' limit 1;
>
>    QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..801.06 rows=1 width=184)
>   ->  Index Scan using idx_emails_email_sender on emails
> (cost=0.00..370089.46 rows=462 width=184)

For this index the planner only expects 462 rows, so it has a higher selectivity than the index on msg_id and therefore
theplanner prefers this index over the other one. 

>         Index Cond: (email_sender = 'sender@domain.com'::text)
>         Filter: (email_msg_id = '4dba381b-f55e-02d3-4b4a-95e2e98178e2'::bpchar)

Apparently there are quite a few records matching that sender, and the database has to check each of them whether they
havethe requested msg_id or not. Apparently the planner decided that a seqscan on the results would be the most
efficienthere - and it would indeed be fairly quick if there are only 462 rows with this sender. 

In summary, I think your statistics are off. Do you vacuum frequently enough? Autovacuum helps here, but there have
beenlarge improvements to that in later versions. 

Another approach would be an index on (email_sender, email_msg_id) - that would particularly help the second query and
itshouldn't hurt queries on just email_sender much. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4c2325a8286216564294622!



pgsql-general by date:

Previous
From: Michelle Konzack
Date:
Subject: HA for PostgreSQL (Auth-Server)
Next
From: tv@fuzzy.cz
Date:
Subject: Re: Bad query performance with more conditions?