Thread: Avoiding duplication of code via views -- slower? How do people typically do this?

See
https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txtfor
the code.

I have promotions(id, end_at, quantity) and promotion_usages(promotion_id).

I have a couple of things I typically want to retrieve, and I'd like those
things to be composable.  In this case, finding recently-expired
promotions, finding promotions that have a quantity of one, and finding
promotions that were used.

My approach is to put these conditions into views, then I can join against
each one. But that approach is much slower than inlining all the code.

How is this typically done?

Thanks,
Joe

Re: Avoiding duplication of code via views -- slower? How do people typically do this?

From
Jack Christensen
Date:
Joe Van Dyk wrote:
> See
> https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txt
> for the code.
>
> I have promotions(id, end_at, quantity) and
> promotion_usages(promotion_id).
>
> I have a couple of things I typically want to retrieve, and I'd like
> those things to be composable.  In this case, finding recently-expired
> promotions, finding promotions that have a quantity of one, and
> finding promotions that were used.
>
> My approach is to put these conditions into views, then I can join
> against each one. But that approach is much slower than inlining all
> the code.
>
> How is this typically done?
>
> Thanks,
> Joe
>
>
 From your first example on the gist I extracted this. It should avoid
the multiple scans and hash join the the join of the two views suffers
from.

create view promotions_with_filters as (
   select *,
     end_at > now() - '30 days'::interval as recently_expired,
     quantity = 1 as one_time_use,
     exists(select 1 from promotion_usages pu on pu.promotion_id = p.id)
as used
   from promotions
);

select count(*) from promotions_with_filters where recently_expired and
one_time_use;
On Thu, Feb 14, 2013 at 6:31 PM, Jack Christensen
<jack@jackchristensen.com>wrote:

> Joe Van Dyk wrote:
>
>> See https://gist.github.com/**joevandyk/4957646/raw/**
>>
86d55472ff8b5a4a6740d9c673d18a**7005738467/gistfile1.txt<https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txt>for
thecode. 
>>
>> I have promotions(id, end_at, quantity) and promotion_usages(promotion_id)
>> **.
>>
>> I have a couple of things I typically want to retrieve, and I'd like
>> those things to be composable.  In this case, finding recently-expired
>> promotions, finding promotions that have a quantity of one, and finding
>> promotions that were used.
>>
>> My approach is to put these conditions into views, then I can join
>> against each one. But that approach is much slower than inlining all the
>> code.
>>
>> How is this typically done?
>>
>> Thanks,
>> Joe
>>
>>
>>  From your first example on the gist I extracted this. It should avoid
> the multiple scans and hash join the the join of the two views suffers from.
>
> create view promotions_with_filters as (
>   select *,
>     end_at > now() - '30 days'::interval as recently_expired,
>     quantity = 1 as one_time_use,
>     exists(select 1 from promotion_usages pu on pu.promotion_id = p.id)
> as used
>   from promotions
> );
>
> select count(*) from promotions_with_filters where recently_expired and
> one_time_use;
>


Perhaps I fat-fingered something somewhere... I tried that and I got this:
https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt

The with_filters view uses a different plan.
On Thu, Feb 14, 2013 at 4:32 PM, Joe Van Dyk <joe@tanga.com> wrote:

> See
> https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txtfor the code.
>
> I have promotions(id, end_at, quantity) and promotion_usages(promotion_id).
>
> I have a couple of things I typically want to retrieve, and I'd like those
> things to be composable.  In this case, finding recently-expired
> promotions, finding promotions that have a quantity of one, and finding
> promotions that were used.
>
> My approach is to put these conditions into views, then I can join against
> each one. But that approach is much slower than inlining all the code.
>
> How is this typically done?
>

First I am not usually a fan of trying to reduce code duplication by using
views.   In general, my experience is that this makes it very easy to make
things slower, and it adds  unexpected optimization hedges in unexpected
places.

Your problem here seems to be of this sort.  You are joining together two
views in order to add filters.  These operations are not really guaranteed
to be the same and so you have an unexpected optimization fence.

My general rule of thumb is to consider moving inline views and WITH
clauses into views as needed.

Now I think there are a bunch of ways to accomplish what you are trying to
do here.

 At the risk of jumping straight ahead into advanced functionality and the
accusations that I am making use of magic wands, I will suggest an
object-relational approach to reducing code duplication.  This would be to
eliminate most your filter views and make use instead of table methods.

CREATE FUNCTION recently_expired(promotion) returns bool language sql
immutable as
$$
select $1.ended at > now() - '30 days'::interval;
$$;

CREATE FUNCTION is_one_time(promotion) returns bool language sql immutable
as
$$
select $1.quantity = 1;
$$;

The one thing is you'd probably have to manually write in your join against
promotion_uses to make that effective,  But you could instead do:

select p.id from promotions p join promotion_usages pu on pu.promotion_id = p.id

where p.is_one_time and p.recently_expired;

Re: Avoiding duplication of code via views -- slower? How do people typically do this?

From
Jack Christensen
Date:
Joe Van Dyk wrote:
>
> Perhaps I fat-fingered something somewhere... I tried that and I got
> this:
> https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt
>
> The with_filters view uses a different plan.
Interesting. It is avoiding the hash join, but it is still evaluating
the exists column even when it is not referenced at all in the select. I
would have expected the optimizer to remove it entirely.
Joe Van Dyk <joe@tanga.com> writes:
> Perhaps I fat-fingered something somewhere... I tried that and I got this:
> https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt

Try without the useless "is true" bits.

            regards, tom lane
On Fri, Feb 15, 2013 at 7:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Joe Van Dyk <joe@tanga.com> writes:
> > Perhaps I fat-fingered something somewhere... I tried that and I got
> this:
> >
> https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt
>
> Try without the useless "is true" bits.
>
>                         regards, tom lane
>


Huh, that did do the trick. Why does "is true" affect the plan?

without "is true" in the conditions:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16676.66..16676.67 rows=1 width=0) (actual time=
95.648..95.648 rows=1 loops=1)
   ->  Bitmap Heap Scan on promotions p  (cost=868.37..16619.49 rows=22868
width=0) (actual time=11.031..95.294 rows=2720 loops=1)
         Recheck Cond: (end_at > (now() - '30 days'::interval))
         Filter: ((quantity = 1) AND (SubPlan 1))
         Rows Removed by Filter: 43073
         ->  Bitmap Index Scan on index_promotions_on_end_at
 (cost=0.00..862.65 rows=46093 width=0) (actual time=10.783..10.783
rows=73234 loops=1)
               Index Cond: (end_at > (now() - '30 days'::interval))
         SubPlan 1
           ->  Index Only Scan using index_promotion_usages_on_promotion_id
on promotion_usages pu  (cost=0.00..20.54 rows=178 width=0) (actual
time=0.001..0.001 rows=0 loops=44998)
                 Index Cond: (promotion_id = p.id)
                 Heap Fetches: 2720
 Total runtime: 95.739 ms
(12 rows)


with "is true" in the conditions:

          QUERY PLAN


--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=94430.93..94430.94 rows=1 width=0) (actual
time=534.568..534.569 rows=1 loops=1)
   ->  Seq Scan on promotions p  (cost=0.00..94373.76 rows=22868 width=0)
(actual time=0.306..534.165 rows=2720 loops=1)
         Filter: (((quantity = 1) IS TRUE) AND ((end_at > (now() - '30
days'::interval)) IS TRUE) AND ((SubPlan 1) IS TRUE))
         Rows Removed by Filter: 600105
         SubPlan 1
           ->  Index Only Scan using index_promotion_usages_on_promotion_id
on promotion_usages pu  (cost=0.00..20.54 rows=178 width=0) (actual
time=0.001..0.001 rows=0 loops=44998)
                 Index Cond: (promotion_id = p.id)
                 Heap Fetches: 2720
 Total runtime: 534.627 ms
(9 rows)
Joe Van Dyk <joe@tanga.com> writes:
> On Fri, Feb 15, 2013 at 7:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Try without the useless "is true" bits.

> Huh, that did do the trick. Why does "is true" affect the plan?

Because "(x = y) IS TRUE" isn't the same as "x = y".  (The behavior for
nulls is different.)  And the planner only knows about using the latter
type of condition for indexscans.  Since you need it to convert the
end_at condition into an indexscan to get a fast plan, you lose.

It's conceivable that we could teach the planner about this case, but
I haven't seen enough people doing that to make me think it's worth the
code space and planner cycles.

            regards, tom lane
On Fri, Feb 15, 2013 at 1:42 AM, Chris Travers <chris.travers@gmail.com> wrote:
> On Thu, Feb 14, 2013 at 4:32 PM, Joe Van Dyk <joe@tanga.com> wrote:
>>
>> See
>> https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txt
>> for the code.
>>
>> I have promotions(id, end_at, quantity) and
>> promotion_usages(promotion_id).
>>
>> I have a couple of things I typically want to retrieve, and I'd like those
>> things to be composable.  In this case, finding recently-expired promotions,
>> finding promotions that have a quantity of one, and finding promotions that
>> were used.
>>
>> My approach is to put these conditions into views, then I can join against
>> each one. But that approach is much slower than inlining all the code.
>>
>> How is this typically done?
>
>
> First I am not usually a fan of trying to reduce code duplication by using
> views.   In general, my experience is that this makes it very easy to make
> things slower, and it adds  unexpected optimization hedges in unexpected
> places.
>
> Your problem here seems to be of this sort.  You are joining together two
> views in order to add filters.  These operations are not really guaranteed
> to be the same and so you have an unexpected optimization fence.
>
> My general rule of thumb is to consider moving inline views and WITH clauses
> into views as needed.
>
> Now I think there are a bunch of ways to accomplish what you are trying to
> do here.
>
>  At the risk of jumping straight ahead into advanced functionality and the
> accusations that I am making use of magic wands, I will suggest an
> object-relational approach to reducing code duplication.  This would be to
> eliminate most your filter views and make use instead of table methods.
>
> CREATE FUNCTION recently_expired(promotion) returns bool language sql
> immutable as
> $$
> select $1.ended at > now() - '30 days'::interval;
> $$;
>
> CREATE FUNCTION is_one_time(promotion) returns bool language sql immutable
> as
> $$
> select $1.quantity = 1;
> $$;

Unfortunately from performance point of view that is a much worse way
to do things.  Pushing checks into function like that forces
processing into a iterative model which has a much worse set of
performance gotchas that have essentially no workaround. This is
because there is no way to force the function to be inlined.  What I'd
like to see is to have a new function decoration, INLINE, that
introduces some constraints to how the function can be written and
forces the function to be expanded in the query at plan time.  Only
then will there be a true alternative to using views, especially if
you could inline through a LATERAL function call.

merlin.