Exclusion List - Mailing list pgsql-sql
From | Michael Richards |
---|---|
Subject | Exclusion List |
Date | |
Msg-id | 3B68202C.0001A3.93146@frodo.searchcanada.ca Whole thread Raw |
Responses |
Re: Exclusion List
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-sql |
I've got 2 tables, one with a list of users (has only about 5000 entries) and another with a list of userids that have already been sent letters. I'm trying to efficiently join these two so I get every user who hasn't been sent a letter. The problem is, coupled with the 5 other joins on the user table the query can take about 5 minutes to run. It's performing the joins and then reducing the list. The reduction of the list doesn't seem to be terribly efficient. Here are some strategies I've been looking at: select id from users WHERE id not in (select userid from sentletters where lettertype=1) AND aclgroup IN (1,2); Seq Scan on users (cost=0.00..217751.39 rows=5369 width=4) SubPlan -> Seq Scan on sentletters (cost=0.00..81.06 rows=4405width=4) select id from users WHERE not exists (select userid from sentletters where lettertype=1 AND userid=id) AND aclgroup IN(1,2); Seq Scan on users (cost=0.00..10980.07 rows=1 width=4) SubPlan -> Index Scan using sentletters_userid_key on sentletters (cost=0.00..2.02 rows=1 width=4) select id from users AS u LEFT JOIN sentletters AS sl ON (lettertype=1 AND sl.userid=u.id) where sl.userid IS NULL AND u.aclgroup IN (1,2); Hash Join (cost=81.06..550.18 rows=5322 width=12) -> Seq Scan on users u (cost=0.00..152.53 rows=5322 width=4) -> Hash (cost=70.05..70.05 rows=4405 width=8) -> Seq Scan on sentletters sl (cost=0.00..70.05 rows=4405 width=8) All but the last which is an odd way to do it have nasty query plan and this isn't even talking about the joins. I then need to join these results with a table that has about 200,000 rows and is joined on the userid and some other key elements. Any suggestions on this? The full query causing the problem is: select u.id,u.firstname,u.surname,f2.strval,f3.strval,f4.strval, f5.strval,u2.firstname,u2.surname,m.strval from users as u JOIN dft_formdata as f1 ON (u.id=f1.userid AND u.aclgroup IN (1,2) AND f1.formid=1 AND f1.fieldid=1) LEFT JOIN dft_formdata as f2 ON (u.id=f2.userid AND f2.formid=1 AND f2.fieldid=2) JOIN dft_formdata asf3 on (u.id=f3.userid AND f3.formid=1 AND f3.fieldid=3) JOIN dft_formdata as f4 on (u.id=f4.userid AND f4.formid=1AND f4.fieldid=5) JOIN relations as r on (u.id=r.relatedto AND r.type=2) JOIN users as u2 on (u2.id=r.useridAND u2.aclgroup=200) JOIN dft_formdata as f5 on (u.id=f5.userid AND f5.formid=1 AND f5.fieldid=4) JOINdft_formmvl as m on (m.id=f5.intval and m.mvlid=1) JOIN ft_formdata as f6 on (u.id=f6.userid AND f6.formid=1 ANDf6.fieldid=155 AND f6.intval=3) WHERE not exists (select userid from sentletters WHERE userid=u.id); Nested Loop (cost=0.00..11280.10 rows=1 width=164)-> Nested Loop (cost=0.00..11276.36 rows=1 width=160) -> Nested Loop (cost=0.00..11274.33 rows=1 width=144) -> Nested Loop (cost=0.00..11270.59 rows=1 width=124) -> Nested Loop (cost=0.00..11268.56rows=1 width=96) -> Nested Loop (cost=0.00..10981.55 rows=1 width=88) -> Nested Loop (cost=0.00..10977.82rows=1 width=72) -> Nested Loop (cost=0.00..10974.10 rows=1 width=56) -> Nested Loop (cost=0.00..10970.37rows=1 width=32) -> Seq Scan on users u (cost=0.00..10966.65 rows=1 width=28) SubPlan -> Index Scan using sentletters_userid_key on sentletters (cost=0.00..2.01 rows=1 width=4) -> Index Scan using dft_formdata_userid_field on dft_formdata f1 (cost=0.00..3.71 rows=1 width=4) -> Index Scan using dft_formdata_userid_field on dft_formdata f2 (cost=0.00..3.71 rows=1 width=24) -> Index Scan using dft_formdata_userid_field on dft_formdata f3 (cost=0.00..3.71 rows=1 width=16) -> Index Scan using dft_formdata_userid_field on dft_formdata f4 (cost=0.00..3.71 rows=1 width=16) -> Seq Scan on relations r (cost=0.00..185.43 rows=5079 width=8) -> Index Scan using users_pkey on users u2 (cost=0.00..2.02 rows=1 width=28) -> Index Scan using dft_formdata_userid_field on dft_formdata f5 (cost=0.00..3.71 rows=1 width=20) -> Index Scan using dft_formmvl_pkey on dft_formmvl m (cost=0.00..2.02 rows=1 width=16)-> Index Scan using dft_formdata_userid_field on dft_formdata f6 (cost=0.00..3.71 rows=1 width=4) Yes I know the query itself is really nasty but I think 5 minutes is a little excessive. -Michael _________________________________________________________________ http://fastmail.ca/ - Fast Free Web Email for Canadians