Thread: Factoring where clauses through UNIONS take 2
Okay, after upgrading Postgres to 7.3.2, I got some improvements to my query through factoring, but it was obvious my tables needed a little help. Therefore, I've modified my base tables and views, and now, even with enable_seqscan=off, it's not factoring through. Here is my query: select * from all_actions where campaign = 3; The full query plan is at the end of the email. Basically, all_actions is a view of several tables, each of which either have a field called "campaign" or "cached_campaign". It is indexed for all affected tables. However, for every table it does a sequential scan rather than an indexed scan. Any ideas on how to get it to at least attempt an indexed scan? More details can be provided if needed. Thank you in advance for your help. Here is the full query plan: Nested Loop (cost=300092946.09..300095861.21 rows=163 width=137) -> Hash Join (cost=300092946.09..300095335.40 rows=163 width=104) Hash Cond: ("outer".sent_mail = "inner".object_id) -> Subquery Scan aa (cost=300088861.93..300091114.08 rows=18017 width=134) -> Unique (cost=300088861.93..300091114.08 rows=18017 width=134) -> Sort (cost=300088861.93..300089312.36 rows=180172 width=134) Sort Key: sent_mail, occurrence_date, "type", data -> Append (cost=0.00..300057538.85 rows=180172 width=134) -> Subquery Scan "*SELECT* 1" (cost=0.00..764.30 rows=11118 width=80) -> Merge Join (cost=0.00..764.30 rows=11118 width=80) Merge Cond: ("outer".link = "inner".object_id) -> Index Scan using action_click_link_index on action_click ac (cost=0.00..568.78 rows=11118 width=24) -> Index Scan using campaign_links_pkey on campaign_links cl (cost=0.00..27.98 rows=722 width=56) -> Subquery Scan "*SELECT* 2" (cost=100000000.00..100000069.00 rows=1400 width=60) -> Seq Scan on action_reply ar (cost=100000000.00..100000069.00 rows=1400 width=60) -> Subquery Scan "*SELECT* 3" (cost=100000000.00..100000092.27 rows=2327 width=134) -> Seq Scan on action_forward af (cost=100000000.00..100000092.27 rows=2327 width=134) -> Subquery Scan "*SELECT* 4" (cost=100000000.00..100000018.86 rows=586 width=16) -> Seq Scan on action_unsubscribe au (cost=100000000.00..100000018.86 rows=586 width=16) -> Subquery Scan "*SELECT* 5" (cost=0.00..18497.71 rows=39073 width=33) -> Merge Join (cost=0.00..18497.71 rows=39073 width=33) Merge Cond: ("outer".list_member = "inner".object_id) Filter: (("inner".is_test = false) OR ("inner".is_test IS NULL)) -> Index Scan using campaign_sent_mails_list_member on campaign_sent_mails (cost=0.00..10409.94 rows=39073 width=24) Filter: (date_opened IS NOT NULL) -> Index Scan using list_members_pkey on list_members (cost=0.00..7206.64 rows=117749 width=9) -> Subquery Scan "*SELECT* 6" (cost=0.00..18395.81 rows=17829 width=33) -> Merge Join (cost=0.00..18395.81 rows=17829 width=33) Merge Cond: ("outer".list_member = "inner".object_id) Filter: (("inner".is_test = false) OR ("inner".is_test IS NULL)) -> Index Scan using campaign_sent_mails_list_member on campaign_sent_mails csm (cost=0.00..10679.53 rows=17829 width=24) Filter: (bounced = true) -> Index Scan using list_members_pkey on list_members lm (cost=0.00..7206.64 rows=117749 width=9) -> Subquery Scan "*SELECT* 7" (cost=0.00..19700.90 rows=107838 width=33) -> Merge Join (cost=0.00..19700.90 rows=107838 width=33) Merge Cond: ("outer".list_member = "inner".object_id) Filter: (("inner".is_test = false) OR ("inner".is_test IS NULL)) -> Index Scan using campaign_sent_mails_list_member on campaign_sent_mails csm (cost=0.00..10409.94 rows=107838 width=24) Filter: (date_sent IS NOT NULL) -> Index Scan using list_members_pkey on list_members lm (cost=0.00..7206.64 rows=117749 width=9) -> Hash (cost=4081.72..4081.72 rows=973 width=24) -> Index Scan using campaign_sent_mails_pkey on campaign_sent_mails csm (cost=0.00..4081.72 rows=973 width=24) Filter: (campaign = 3) -> Index Scan using list_members_pkey on list_members lm (cost=0.00..3.22 rows=1 width=33) Index Cond: ("outer".list_member = lm.object_id) (45 rows) -----------
On Thu, 24 Apr 2003, Jonathan Bartlett wrote: > Okay, after upgrading Postgres to 7.3.2, I got some improvements to my > query through factoring, but it was obvious my tables needed a little > help. Therefore, I've modified my base tables and views, and now, even > with enable_seqscan=off, it's not factoring through. Here is my query: > > select * from all_actions where campaign = 3; > > The full query plan is at the end of the email. Basically, all_actions is > a view of several tables, each of which either have a field called > "campaign" or "cached_campaign". It is indexed for all affected tables. > However, for every table it does a sequential scan rather than an indexed > scan. Any ideas on how to get it to at least attempt an indexed scan? What does the actual view look like? It seems to be pushing campaign=3 into a scan of campaign_sent_mails (although it's using a different index on that).
THe actual view is: create view all_actions_v2 as select sent_mail, 'REPLY' as type, cached_campaign as campaign, cached_list_member as list_member, reply_date as occurence_date, reply_subject as other_data from action_reply UNION select sent_mail, 'FORWARD' as type, cached_campaign as campaign, cached_list_member as list_member, forward_date as occurence_date, destination_email as other_data from action_forward UNION select ac.sent_mail, 'CLICK' as type, ac.cached_campaign as campaign, cached_list_member as list_member, ac.click_date as occurence_date, cl.url as other_data from action_click ac, campaign_links cl where ac.link = cl.object_id UNION select sent_mail, 'UNSUBSCRIBE' as type, cached_campaign as campaign, cached_list_member as list_member, unsubscribe_date as occurence_date, NULL::varchar as other_data from action_unsubscribe UNION select object_id as sent_mail, 'BOUNCE' as type, campaign, list_member, date_sent as occurence_date, NULL::varchar as other_data from campaign_sent_mails where bounced = true UNION select object_id as sent_mail, 'SENT' as type, campaign, list_member, date_sent as occurrence_date, NULL::varchar as other_data from campaign_sent_mails UNION select object_id as sent_mail, 'OPEN' as type, campaign, list_member, date_opened as occurrence_date, NULL::varchar as other_data from campaign_sent_mails where date_opened is not NULL; Manually factoring the where clause through would produce this: select sent_mail, 'REPLY' as type, cached_campaign as campaign, cached_list_member as list_member, reply_date as occurence_date, reply_subject as other_data from action_reply where cached_campaign = 3 UNION select sent_mail, 'FORWARD' as type, cached_campaign as campaign, cached_list_member as list_member, forward_date as occurence_date, destination_email as other_data from action_forward where cached_campaign = 3 UNION select ac.sent_mail, 'CLICK' as type, ac.cached_campaign as campaign, cached_list_member as list_member, ac.click_date as occurence_date, cl.url as other_data from action_click ac, campaign_links cl where ac.link = cl.object_id and cached_campaign = 3 UNION select sent_mail, 'UNSUBSCRIBE' as type, cached_campaign as campaign, cached_list_member as list_member, unsubscribe_date as occurence_date, NULL::varchar as other_data from action_unsubscribe where cached_campaign = 3 UNION select object_id as sent_mail, 'BOUNCE' as type, campaign, list_member, date_sent as occurence_date, NULL::varchar as other_data from campaign_sent_mails where bounced = true and campaign = 3 UNION select object_id as sent_mail, 'SENT' as type, campaign, list_member, date_sent as occurrence_date, NULL::varchar as other_data from campaign_sent_mails where campaign = 3 UNION select object_id as sent_mail, 'OPEN' as type, campaign, list_member, date_opened as occurrence_date, NULL::varchar as other_data from campaign_sent_mails where date_opened is not NULL and campaign = 3; Using the view, the request takes 10 seconds. Manually factoring takes 2 seconds. I'd like to keep the view if possible for a little better data discipline. Jon On Thu, 24 Apr 2003, Stephan Szabo wrote: > > On Thu, 24 Apr 2003, Jonathan Bartlett wrote: > > > Okay, after upgrading Postgres to 7.3.2, I got some improvements to my > > query through factoring, but it was obvious my tables needed a little > > help. Therefore, I've modified my base tables and views, and now, even > > with enable_seqscan=off, it's not factoring through. Here is my query: > > > > select * from all_actions where campaign = 3; > > > > The full query plan is at the end of the email. Basically, all_actions is > > a view of several tables, each of which either have a field called > > "campaign" or "cached_campaign". It is indexed for all affected tables. > > However, for every table it does a sequential scan rather than an indexed > > scan. Any ideas on how to get it to at least attempt an indexed scan? > > What does the actual view look like? It seems to be pushing campaign=3 > into a scan of campaign_sent_mails (although it's using a different index > on that). >
On Thu, 24 Apr 2003, Jonathan Bartlett wrote: > THe actual view is: > > create view all_actions_v2 as select sent_mail, 'REPLY' as type, > cached_campaign as campaign, cached_list_member as list_member, reply_date > as occurence_date, reply_subject as other_data from action_reply UNION > select sent_mail, 'FORWARD' as type, cached_campaign as campaign, > cached_list_member as list_member, forward_date as occurence_date, > destination_email as other_data from action_forward UNION select > ac.sent_mail, 'CLICK' as type, ac.cached_campaign as campaign, > cached_list_member as list_member, ac.click_date as occurence_date, cl.url > as other_data from action_click ac, campaign_links cl where ac.link = > cl.object_id UNION select sent_mail, 'UNSUBSCRIBE' as type, > cached_campaign as campaign, cached_list_member as list_member, > unsubscribe_date as occurence_date, NULL::varchar as other_data from > action_unsubscribe UNION select object_id as sent_mail, 'BOUNCE' as type, > campaign, list_member, date_sent as occurence_date, NULL::varchar as > other_data from campaign_sent_mails where bounced = true UNION select > object_id as sent_mail, 'SENT' as type, campaign, list_member, date_sent > as occurrence_date, NULL::varchar as other_data from campaign_sent_mails > UNION select object_id as sent_mail, 'OPEN' as type, campaign, > list_member, date_opened as occurrence_date, NULL::varchar as other_data > from campaign_sent_mails where date_opened is not NULL; On a probably unrelated side note, ISTM you can use UNION ALL rather than UNION in the above and lose the uniqueness test at the end (since the constant strings seem different for each one, it shouldn't do anything). The thing that I think is killing it is the constants. A view like: create view qv1 as select a as b from q1 union select b from q2; explain select * from qv1 where b=3; pushes down into the selects. create view qv1 as select a as b, 'b' from q1 union select b,'c' from q2; explain select * from qv1 where b=3; doesn't.
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > The thing that I think is killing it is the constants. A view like: > create view qv1 as select a as b from q1 union select b from q2; > explain select * from qv1 where b=3; > pushes down into the selects. > create view qv1 as select a as b, 'b' from q1 union select b,'c' from q2; > explain select * from qv1 where b=3; > doesn't. Good catch. It would work the way Jonathan wants if he explicitly coerces all those literals in the view definition to text (or *some* specific datatype, anyway). The reason the planner won't push down is that the result type of the UNION's second column is TEXT, while the result types of the individual sub-selects are UNKNOWN, and there are semantic issues with pushing down a qual past a datatype conversion: it might not mean quite the same thing. (Consider text vs char(n) and significant-blanks rules, for instance.) In this particular case, since the qual doesn't actually touch the type-converted column, it would have been safe to push down, but the planner doesn't make this test on a per-qual basis: if there are type conversions anywhere in the UNION it just punts. regards, tom lane
I said: > In this particular case, since the qual doesn't actually touch the > type-converted column, it would have been safe to push down, but the > planner doesn't make this test on a per-qual basis: if there are type > conversions anywhere in the UNION it just punts. CVS tip has now had its consciousness raised on this point ;-). Turns out that it's not significantly more expensive to apply the test to individual vars instead of having a blanket check on the whole query, so I fixed it. Jonathan will still need to put the explicit casts into his view for use with 7.3.*, though. He probably should do that anyway, because even in 7.4 a restriction on the "type" column would not get pushed down as desired if the literals are UNKNOWN type. regards, tom lane
Does this apply to NULLs as well? Thank you for your extremely prompt, intelligent, and helpful replies. Jon On Thu, 24 Apr 2003, Tom Lane wrote: > I said: > > In this particular case, since the qual doesn't actually touch the > > type-converted column, it would have been safe to push down, but the > > planner doesn't make this test on a per-qual basis: if there are type > > conversions anywhere in the UNION it just punts. > > CVS tip has now had its consciousness raised on this point ;-). Turns > out that it's not significantly more expensive to apply the test to > individual vars instead of having a blanket check on the whole query, > so I fixed it. > > Jonathan will still need to put the explicit casts into his view for > use with 7.3.*, though. He probably should do that anyway, because > even in 7.4 a restriction on the "type" column would not get pushed > down as desired if the literals are UNKNOWN type. > > regards, tom lane >
Jonathan Bartlett <johnnyb@eskimo.com> writes: > Does this apply to NULLs as well? Yes, NULL literals are type UNKNOWN until proved otherwise. regards, tom lane