Thread: Bad query performance with more conditions?
Hi All, I have a table named emails which created by following script: create table emails ( email_id bigserial primary key, email_msg_id char(36) not null, email_sender text not null) The table has more than 10 million of records and I have already created indexes on email_msg_id and email_sender Following script finished very quickly (less than 1 seconds): select * from emails where email_msg_id = '4dba381b-f55e-02d3-4b4a-95e2e98178e2' limit 1 But a similar script cost more than ten seconds: select * from emails where email_msg_id = '4dba381b-f55e-02d3-4b4a-95e2e98178e2' and email_sender_text = 'sender@domain.com' limit 1 Both those scripts returns 0 rows of records indeed. Any comment are highly appreciated.
On 24 June 2010 09:52, kaifeng.zhu <cafeeee@gmail.com> wrote: > Hi All, > > I have a table named emails which created by following script: > create table emails ( > email_id bigserial primary key, > email_msg_id char(36) not null, > email_sender text not null) > > The table has more than 10 million of records and I have already > created indexes on email_msg_id and email_sender > > Following script finished very quickly (less than 1 seconds): > select * from emails where email_msg_id = > '4dba381b-f55e-02d3-4b4a-95e2e98178e2' limit 1 > But a similar script cost more than ten seconds: > select * from emails where email_msg_id = > '4dba381b-f55e-02d3-4b4a-95e2e98178e2' and email_sender_text = > 'sender@domain.com' limit 1 > > Both those scripts returns 0 rows of records indeed. > > Any comment are highly appreciated. > Sounds like the planner took a wrong turn in the 2nd case. Which version of PostgreSQL are you running? Thom
<resend to mailing list> On Thu, Jun 24, 2010 at 16:57, Thom Brown <thombrown@gmail.com> wrote: > Sounds like the planner took a wrong turn in the 2nd case. Which > version of PostgreSQL are you running? PostgreSQL version 8.1.21 (With schemas) The explain commands show that: db1=> explain select * from emails where email_msg_id = '4dba381b-f55e-02d3-4b4a-95e2e98178e2' limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..4.00 rows=1 width=184) -> Index Scan using idx_emails_email_msg_id on emails (cost=0.00..370016.42 rows=92413 width=184) Index Cond: (email_msg_id = '4dba381b-f55e-02d3-4b4a-95e2e98178e2'::bpchar) (3 rows) db1=> explain select * from emails where email_msg_id = '4dba381b-f55e-02d3-4b4a-95e2e98178e2' and email_sender = 'sender@domain.com' limit 1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..801.06 rows=1 width=184) -> Index Scan using idx_emails_email_sender on emails (cost=0.00..370089.46 rows=462 width=184) Index Cond: (email_sender = 'sender@domain.com'::text) Filter: (email_msg_id = '4dba381b-f55e-02d3-4b4a-95e2e98178e2'::bpchar) (4 rows)
On 24 Jun 2010, at 10:52, kaifeng.zhu wrote: > Hi All, > > I have a table named emails which created by following script: > create table emails ( > email_id bigserial primary key, > email_msg_id char(36) not null, > email_sender text not null) > > The table has more than 10 million of records and I have already > created indexes on email_msg_id and email_sender > > Following script finished very quickly (less than 1 seconds): > select * from emails where email_msg_id = > '4dba381b-f55e-02d3-4b4a-95e2e98178e2' limit 1 > But a similar script cost more than ten seconds: > select * from emails where email_msg_id = > '4dba381b-f55e-02d3-4b4a-95e2e98178e2' and email_sender_text = > 'sender@domain.com' limit 1 > > Both those scripts returns 0 rows of records indeed. > > Any comment are highly appreciated. You didn't provide explain analyse results for those queries, so I'm just guessing here, but I think you should add indicesto email_msg_id and email_sender_text to speed up those queries. Actually, from your results it would appear that you did create an index on the former, while you probably didn't on thelatter, causing the second query to have to scan sequentially through all records. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4c2321f5286211760940954!
On Thu, Jun 24, 2010 at 17:14, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote: > > You didn't provide explain analyse results for those queries, so I'm just guessing here, but I think you should add indicesto email_msg_id and email_sender_text to speed up those queries. > > Actually, from your results it would appear that you did create an index on the former, while you probably didn't on thelatter, causing the second query to have to scan sequentially through all records. > > Alban Hertroys > Thanks for your suggestion. The explain analyzes provided in another mail. While I did add indices for both email_msg_id and email_sender_text.
On 24 Jun 2010, at 11:13, kaifeng.zhu wrote: > <resend to mailing list> > > On Thu, Jun 24, 2010 at 16:57, Thom Brown <thombrown@gmail.com> wrote: >> Sounds like the planner took a wrong turn in the 2nd case. Which >> version of PostgreSQL are you running? > > PostgreSQL version 8.1.21 (With schemas) > > > The explain commands show that: For a next time, explain analyse would have been more useful. > > db1=> explain select * from emails where email_msg_id = > '4dba381b-f55e-02d3-4b4a-95e2e98178e2' limit 1; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..4.00 rows=1 width=184) > -> Index Scan using idx_emails_email_msg_id on emails > (cost=0.00..370016.42 rows=92413 width=184) I think your problem is here. I assume that msg-id's are fairly unique between messages, but the planner expects there areover 90,000 records matching this ID. Hence the planner thinks this index has a low selectivity. > Index Cond: (email_msg_id = > '4dba381b-f55e-02d3-4b4a-95e2e98178e2'::bpchar) > (3 rows) > > db1=> explain select * from emails where email_msg_id = > '4dba381b-f55e-02d3-4b4a-95e2e98178e2' and email_sender = > 'sender@domain.com' limit 1; > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..801.06 rows=1 width=184) > -> Index Scan using idx_emails_email_sender on emails > (cost=0.00..370089.46 rows=462 width=184) For this index the planner only expects 462 rows, so it has a higher selectivity than the index on msg_id and therefore theplanner prefers this index over the other one. > Index Cond: (email_sender = 'sender@domain.com'::text) > Filter: (email_msg_id = '4dba381b-f55e-02d3-4b4a-95e2e98178e2'::bpchar) Apparently there are quite a few records matching that sender, and the database has to check each of them whether they havethe requested msg_id or not. Apparently the planner decided that a seqscan on the results would be the most efficienthere - and it would indeed be fairly quick if there are only 462 rows with this sender. In summary, I think your statistics are off. Do you vacuum frequently enough? Autovacuum helps here, but there have beenlarge improvements to that in later versions. Another approach would be an index on (email_sender, email_msg_id) - that would particularly help the second query and itshouldn't hurt queries on just email_sender much. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4c2325a8286216564294622!
> On Thu, Jun 24, 2010 at 17:14, Alban Hertroys > <dalroi@solfertje.student.utwente.nl> wrote: >> >> You didn't provide explain analyse results for those queries, so I'm >> just guessing here, but I think you should add indices to email_msg_id >> and email_sender_text to speed up those queries. >> >> Alban Hertroys >> > > Thanks for your suggestion. > The explain analyzes provided in another mail. No, you haven't. Alban asked for 'EXPLAIN ANALYZE' output but you've provided just 'EXPLAIN' output - that's a big difference, as we can't see if the statistics are off or what is going on. Alban actually provided some useful insights in one of the following e-mails, but EXPLAIN ANALYZE output would be useful to confirm his conclusions. Tomas