Thread: Factoring where clauses through UNIONS take 2

Factoring where clauses through UNIONS take 2

From
Jonathan Bartlett
Date:
Okay, after upgrading Postgres to 7.3.2, I got some improvements to my
query through factoring, but it was obvious my tables needed a little
help.  Therefore, I've modified my base tables and views, and now, even
with enable_seqscan=off, it's not factoring through.  Here is my query:

select * from all_actions where campaign = 3;

The full query plan is at the end of the email.  Basically, all_actions is
a view of several tables, each of which either have a field called
"campaign" or "cached_campaign".  It is indexed for all affected tables.
However, for every table it does a sequential scan rather than an indexed
scan.  Any ideas on how to get it to at least attempt an indexed scan?

More details can be provided if needed.  Thank you in advance for your
help.

Here is the full query plan:

 Nested Loop (cost=300092946.09..300095861.21 rows=163 width=137)
   -> Hash Join (cost=300092946.09..300095335.40 rows=163 width=104)
         Hash Cond: ("outer".sent_mail = "inner".object_id)
         -> Subquery Scan aa (cost=300088861.93..300091114.08 rows=18017
width=134)
               -> Unique (cost=300088861.93..300091114.08 rows=18017
width=134)
                     -> Sort (cost=300088861.93..300089312.36 rows=180172
width=134)
                           Sort Key: sent_mail, occurrence_date, "type",
data
                           -> Append (cost=0.00..300057538.85 rows=180172
width=134)
                                 -> Subquery Scan "*SELECT* 1"
(cost=0.00..764.30 rows=11118 width=80)
                                       -> Merge Join (cost=0.00..764.30
rows=11118 width=80)
                                             Merge Cond: ("outer".link =
"inner".object_id)
                                             -> Index Scan using
action_click_link_index on action_click ac (cost=0.00..568.78 rows=11118
width=24)
                                             -> Index Scan using
campaign_links_pkey on campaign_links cl (cost=0.00..27.98 rows=722
width=56)
                                 -> Subquery Scan "*SELECT* 2"
(cost=100000000.00..100000069.00 rows=1400 width=60)
                                       -> Seq Scan on action_reply ar
(cost=100000000.00..100000069.00 rows=1400 width=60)
                                 -> Subquery Scan "*SELECT* 3"
(cost=100000000.00..100000092.27 rows=2327 width=134)
                                       -> Seq Scan on action_forward af
(cost=100000000.00..100000092.27 rows=2327 width=134)
                                 -> Subquery Scan "*SELECT* 4"
(cost=100000000.00..100000018.86 rows=586 width=16)
                                       -> Seq Scan on action_unsubscribe
au (cost=100000000.00..100000018.86 rows=586 width=16)
                                 -> Subquery Scan "*SELECT* 5"
(cost=0.00..18497.71 rows=39073 width=33)
                                       -> Merge Join (cost=0.00..18497.71
rows=39073 width=33)
                                             Merge Cond:
("outer".list_member = "inner".object_id)
                                             Filter: (("inner".is_test =
false) OR ("inner".is_test IS NULL))
                                             -> Index Scan using
campaign_sent_mails_list_member on campaign_sent_mails
(cost=0.00..10409.94 rows=39073 width=24)
                                                   Filter: (date_opened IS
NOT NULL)
                                             -> Index Scan using
list_members_pkey on list_members (cost=0.00..7206.64 rows=117749 width=9)
                                 -> Subquery Scan "*SELECT* 6"
(cost=0.00..18395.81 rows=17829 width=33)
                                       -> Merge Join (cost=0.00..18395.81
rows=17829 width=33)
                                             Merge Cond:
("outer".list_member = "inner".object_id)
                                             Filter: (("inner".is_test =
false) OR ("inner".is_test IS NULL))
                                             -> Index Scan using
campaign_sent_mails_list_member on campaign_sent_mails csm
(cost=0.00..10679.53 rows=17829 width=24)
                                                   Filter: (bounced =
true)
                                             -> Index Scan using
list_members_pkey on list_members lm (cost=0.00..7206.64 rows=117749
width=9)
                                 -> Subquery Scan "*SELECT* 7"
(cost=0.00..19700.90 rows=107838 width=33)
                                       -> Merge Join (cost=0.00..19700.90
rows=107838 width=33)
                                             Merge Cond:
("outer".list_member = "inner".object_id)
                                             Filter: (("inner".is_test =
false) OR ("inner".is_test IS NULL))
                                             -> Index Scan using
campaign_sent_mails_list_member on campaign_sent_mails csm
(cost=0.00..10409.94 rows=107838 width=24)
                                                   Filter: (date_sent IS
NOT NULL)
                                             -> Index Scan using
list_members_pkey on list_members lm (cost=0.00..7206.64 rows=117749
width=9)
         -> Hash (cost=4081.72..4081.72 rows=973 width=24)
               -> Index Scan using campaign_sent_mails_pkey on
campaign_sent_mails csm (cost=0.00..4081.72 rows=973 width=24)
                     Filter: (campaign = 3)
   -> Index Scan using list_members_pkey on list_members lm
(cost=0.00..3.22 rows=1 width=33)
         Index Cond: ("outer".list_member = lm.object_id) (45 rows)


-----------


Re: Factoring where clauses through UNIONS take 2

From
Stephan Szabo
Date:
On Thu, 24 Apr 2003, Jonathan Bartlett wrote:

> Okay, after upgrading Postgres to 7.3.2, I got some improvements to my
> query through factoring, but it was obvious my tables needed a little
> help.  Therefore, I've modified my base tables and views, and now, even
> with enable_seqscan=off, it's not factoring through.  Here is my query:
>
> select * from all_actions where campaign = 3;
>
> The full query plan is at the end of the email.  Basically, all_actions is
> a view of several tables, each of which either have a field called
> "campaign" or "cached_campaign".  It is indexed for all affected tables.
> However, for every table it does a sequential scan rather than an indexed
> scan.  Any ideas on how to get it to at least attempt an indexed scan?

What does the actual view look like?  It seems to be pushing campaign=3
into a scan of campaign_sent_mails (although it's using a different index
on that).


Re: Factoring where clauses through UNIONS take 2

From
Jonathan Bartlett
Date:
THe actual view is:

create view all_actions_v2 as select sent_mail, 'REPLY' as type,
cached_campaign as campaign, cached_list_member as list_member, reply_date
as occurence_date, reply_subject as other_data from action_reply UNION
select sent_mail, 'FORWARD' as type, cached_campaign as campaign,
cached_list_member as list_member, forward_date as occurence_date,
destination_email as other_data from action_forward UNION select
ac.sent_mail, 'CLICK' as type, ac.cached_campaign as campaign,
cached_list_member as list_member, ac.click_date as occurence_date, cl.url
as other_data from action_click ac, campaign_links cl where ac.link =
cl.object_id UNION select sent_mail, 'UNSUBSCRIBE' as type,
cached_campaign as campaign, cached_list_member as list_member,
unsubscribe_date as occurence_date, NULL::varchar as other_data from
action_unsubscribe UNION select object_id as sent_mail, 'BOUNCE' as type,
campaign, list_member, date_sent as occurence_date, NULL::varchar as
other_data from campaign_sent_mails where bounced = true UNION select
object_id as sent_mail, 'SENT' as type, campaign, list_member, date_sent
as occurrence_date, NULL::varchar as other_data from campaign_sent_mails
UNION select object_id as sent_mail, 'OPEN' as type, campaign,
list_member, date_opened as occurrence_date, NULL::varchar as other_data
from campaign_sent_mails where date_opened is not NULL;

Manually factoring the where clause through would produce this:

select sent_mail, 'REPLY' as type, cached_campaign as campaign,
cached_list_member as list_member, reply_date as occurence_date,
reply_subject as other_data from action_reply where cached_campaign = 3
UNION select sent_mail, 'FORWARD' as type, cached_campaign as campaign,
cached_list_member as list_member, forward_date as occurence_date,
destination_email as other_data from action_forward where cached_campaign
= 3 UNION select ac.sent_mail, 'CLICK' as type, ac.cached_campaign as
campaign, cached_list_member as list_member, ac.click_date as
occurence_date, cl.url as other_data from action_click ac, campaign_links
cl where ac.link = cl.object_id and cached_campaign = 3 UNION select
sent_mail, 'UNSUBSCRIBE' as type, cached_campaign as campaign,
cached_list_member as list_member, unsubscribe_date as occurence_date,
NULL::varchar as other_data from action_unsubscribe where cached_campaign
= 3 UNION select object_id as sent_mail, 'BOUNCE' as type, campaign,
list_member, date_sent as occurence_date, NULL::varchar as other_data from
campaign_sent_mails where bounced = true and campaign = 3 UNION select
object_id as sent_mail, 'SENT' as type, campaign, list_member, date_sent
as occurrence_date, NULL::varchar as other_data from campaign_sent_mails
where campaign = 3 UNION select object_id as sent_mail, 'OPEN' as type,
campaign, list_member, date_opened as occurrence_date, NULL::varchar as
other_data from campaign_sent_mails where date_opened is not NULL and
campaign = 3;

Using the view, the request takes 10 seconds.  Manually factoring takes 2
seconds.  I'd like to keep the view if possible for a little better data
discipline.

Jon

On Thu, 24 Apr 2003, Stephan Szabo wrote:

>
> On Thu, 24 Apr 2003, Jonathan Bartlett wrote:
>
> > Okay, after upgrading Postgres to 7.3.2, I got some improvements to my
> > query through factoring, but it was obvious my tables needed a little
> > help.  Therefore, I've modified my base tables and views, and now, even
> > with enable_seqscan=off, it's not factoring through.  Here is my query:
> >
> > select * from all_actions where campaign = 3;
> >
> > The full query plan is at the end of the email.  Basically, all_actions is
> > a view of several tables, each of which either have a field called
> > "campaign" or "cached_campaign".  It is indexed for all affected tables.
> > However, for every table it does a sequential scan rather than an indexed
> > scan.  Any ideas on how to get it to at least attempt an indexed scan?
>
> What does the actual view look like?  It seems to be pushing campaign=3
> into a scan of campaign_sent_mails (although it's using a different index
> on that).
>


Re: Factoring where clauses through UNIONS take 2

From
Stephan Szabo
Date:
On Thu, 24 Apr 2003, Jonathan Bartlett wrote:

> THe actual view is:
>
> create view all_actions_v2 as select sent_mail, 'REPLY' as type,
> cached_campaign as campaign, cached_list_member as list_member, reply_date
> as occurence_date, reply_subject as other_data from action_reply UNION
> select sent_mail, 'FORWARD' as type, cached_campaign as campaign,
> cached_list_member as list_member, forward_date as occurence_date,
> destination_email as other_data from action_forward UNION select
> ac.sent_mail, 'CLICK' as type, ac.cached_campaign as campaign,
> cached_list_member as list_member, ac.click_date as occurence_date, cl.url
> as other_data from action_click ac, campaign_links cl where ac.link =
> cl.object_id UNION select sent_mail, 'UNSUBSCRIBE' as type,
> cached_campaign as campaign, cached_list_member as list_member,
> unsubscribe_date as occurence_date, NULL::varchar as other_data from
> action_unsubscribe UNION select object_id as sent_mail, 'BOUNCE' as type,
> campaign, list_member, date_sent as occurence_date, NULL::varchar as
> other_data from campaign_sent_mails where bounced = true UNION select
> object_id as sent_mail, 'SENT' as type, campaign, list_member, date_sent
> as occurrence_date, NULL::varchar as other_data from campaign_sent_mails
> UNION select object_id as sent_mail, 'OPEN' as type, campaign,
> list_member, date_opened as occurrence_date, NULL::varchar as other_data
> from campaign_sent_mails where date_opened is not NULL;

On a probably unrelated side note, ISTM you can use UNION ALL rather than
UNION in the above and lose the uniqueness test at the end (since the
constant strings seem different for each one, it shouldn't do anything).

The thing that I think is killing it is the constants.  A view like:

create view qv1 as select a as b from q1 union select b from q2;
explain select * from qv1 where b=3;
pushes down into the selects.

create view qv1 as select a as b, 'b' from q1 union select b,'c' from q2;
explain select * from qv1 where b=3;
doesn't.


Re: Factoring where clauses through UNIONS take 2

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> The thing that I think is killing it is the constants.  A view like:
> create view qv1 as select a as b from q1 union select b from q2;
> explain select * from qv1 where b=3;
> pushes down into the selects.
> create view qv1 as select a as b, 'b' from q1 union select b,'c' from q2;
> explain select * from qv1 where b=3;
> doesn't.

Good catch.  It would work the way Jonathan wants if he explicitly
coerces all those literals in the view definition to text (or *some*
specific datatype, anyway).

The reason the planner won't push down is that the result type of the
UNION's second column is TEXT, while the result types of the individual
sub-selects are UNKNOWN, and there are semantic issues with pushing down
a qual past a datatype conversion: it might not mean quite the same thing.
(Consider text vs char(n) and significant-blanks rules, for instance.)

In this particular case, since the qual doesn't actually touch the
type-converted column, it would have been safe to push down, but the
planner doesn't make this test on a per-qual basis: if there are type
conversions anywhere in the UNION it just punts.

            regards, tom lane


Re: Factoring where clauses through UNIONS take 2

From
Tom Lane
Date:
I said:
> In this particular case, since the qual doesn't actually touch the
> type-converted column, it would have been safe to push down, but the
> planner doesn't make this test on a per-qual basis: if there are type
> conversions anywhere in the UNION it just punts.

CVS tip has now had its consciousness raised on this point ;-).  Turns
out that it's not significantly more expensive to apply the test to
individual vars instead of having a blanket check on the whole query,
so I fixed it.

Jonathan will still need to put the explicit casts into his view for
use with 7.3.*, though.  He probably should do that anyway, because
even in 7.4 a restriction on the "type" column would not get pushed
down as desired if the literals are UNKNOWN type.

            regards, tom lane


Re: Factoring where clauses through UNIONS take 2

From
Jonathan Bartlett
Date:
Does this apply to NULLs as well?

Thank you for your extremely prompt, intelligent, and helpful replies.

Jon

On Thu, 24 Apr 2003, Tom Lane wrote:

> I said:
> > In this particular case, since the qual doesn't actually touch the
> > type-converted column, it would have been safe to push down, but the
> > planner doesn't make this test on a per-qual basis: if there are type
> > conversions anywhere in the UNION it just punts.
>
> CVS tip has now had its consciousness raised on this point ;-).  Turns
> out that it's not significantly more expensive to apply the test to
> individual vars instead of having a blanket check on the whole query,
> so I fixed it.
>
> Jonathan will still need to put the explicit casts into his view for
> use with 7.3.*, though.  He probably should do that anyway, because
> even in 7.4 a restriction on the "type" column would not get pushed
> down as desired if the literals are UNKNOWN type.
>
>             regards, tom lane
>


Re: Factoring where clauses through UNIONS take 2

From
Tom Lane
Date:
Jonathan Bartlett <johnnyb@eskimo.com> writes:
> Does this apply to NULLs as well?

Yes, NULL literals are type UNKNOWN until proved otherwise.

            regards, tom lane