Thread: Avoiding duplication of code via views -- slower? How do people typically do this?
Avoiding duplication of code via views -- slower? How do people typically do this?
From
Joe Van Dyk
Date:
See https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txtfor the code. I have promotions(id, end_at, quantity) and promotion_usages(promotion_id). I have a couple of things I typically want to retrieve, and I'd like those things to be composable. In this case, finding recently-expired promotions, finding promotions that have a quantity of one, and finding promotions that were used. My approach is to put these conditions into views, then I can join against each one. But that approach is much slower than inlining all the code. How is this typically done? Thanks, Joe
Re: Avoiding duplication of code via views -- slower? How do people typically do this?
From
Jack Christensen
Date:
Joe Van Dyk wrote: > See > https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txt > for the code. > > I have promotions(id, end_at, quantity) and > promotion_usages(promotion_id). > > I have a couple of things I typically want to retrieve, and I'd like > those things to be composable. In this case, finding recently-expired > promotions, finding promotions that have a quantity of one, and > finding promotions that were used. > > My approach is to put these conditions into views, then I can join > against each one. But that approach is much slower than inlining all > the code. > > How is this typically done? > > Thanks, > Joe > > From your first example on the gist I extracted this. It should avoid the multiple scans and hash join the the join of the two views suffers from. create view promotions_with_filters as ( select *, end_at > now() - '30 days'::interval as recently_expired, quantity = 1 as one_time_use, exists(select 1 from promotion_usages pu on pu.promotion_id = p.id) as used from promotions ); select count(*) from promotions_with_filters where recently_expired and one_time_use;
Re: Avoiding duplication of code via views -- slower? How do people typically do this?
From
Joe Van Dyk
Date:
On Thu, Feb 14, 2013 at 6:31 PM, Jack Christensen <jack@jackchristensen.com>wrote: > Joe Van Dyk wrote: > >> See https://gist.github.com/**joevandyk/4957646/raw/** >> 86d55472ff8b5a4a6740d9c673d18a**7005738467/gistfile1.txt<https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txt>for thecode. >> >> I have promotions(id, end_at, quantity) and promotion_usages(promotion_id) >> **. >> >> I have a couple of things I typically want to retrieve, and I'd like >> those things to be composable. In this case, finding recently-expired >> promotions, finding promotions that have a quantity of one, and finding >> promotions that were used. >> >> My approach is to put these conditions into views, then I can join >> against each one. But that approach is much slower than inlining all the >> code. >> >> How is this typically done? >> >> Thanks, >> Joe >> >> >> From your first example on the gist I extracted this. It should avoid > the multiple scans and hash join the the join of the two views suffers from. > > create view promotions_with_filters as ( > select *, > end_at > now() - '30 days'::interval as recently_expired, > quantity = 1 as one_time_use, > exists(select 1 from promotion_usages pu on pu.promotion_id = p.id) > as used > from promotions > ); > > select count(*) from promotions_with_filters where recently_expired and > one_time_use; > Perhaps I fat-fingered something somewhere... I tried that and I got this: https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt The with_filters view uses a different plan.
Re: Avoiding duplication of code via views -- slower? How do people typically do this?
From
Chris Travers
Date:
On Thu, Feb 14, 2013 at 4:32 PM, Joe Van Dyk <joe@tanga.com> wrote: > See > https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txtfor the code. > > I have promotions(id, end_at, quantity) and promotion_usages(promotion_id). > > I have a couple of things I typically want to retrieve, and I'd like those > things to be composable. In this case, finding recently-expired > promotions, finding promotions that have a quantity of one, and finding > promotions that were used. > > My approach is to put these conditions into views, then I can join against > each one. But that approach is much slower than inlining all the code. > > How is this typically done? > First I am not usually a fan of trying to reduce code duplication by using views. In general, my experience is that this makes it very easy to make things slower, and it adds unexpected optimization hedges in unexpected places. Your problem here seems to be of this sort. You are joining together two views in order to add filters. These operations are not really guaranteed to be the same and so you have an unexpected optimization fence. My general rule of thumb is to consider moving inline views and WITH clauses into views as needed. Now I think there are a bunch of ways to accomplish what you are trying to do here. At the risk of jumping straight ahead into advanced functionality and the accusations that I am making use of magic wands, I will suggest an object-relational approach to reducing code duplication. This would be to eliminate most your filter views and make use instead of table methods. CREATE FUNCTION recently_expired(promotion) returns bool language sql immutable as $$ select $1.ended at > now() - '30 days'::interval; $$; CREATE FUNCTION is_one_time(promotion) returns bool language sql immutable as $$ select $1.quantity = 1; $$; The one thing is you'd probably have to manually write in your join against promotion_uses to make that effective, But you could instead do: select p.id from promotions p join promotion_usages pu on pu.promotion_id = p.id where p.is_one_time and p.recently_expired;
Re: Avoiding duplication of code via views -- slower? How do people typically do this?
From
Jack Christensen
Date:
Joe Van Dyk wrote: > > Perhaps I fat-fingered something somewhere... I tried that and I got > this: > https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt > > The with_filters view uses a different plan. Interesting. It is avoiding the hash join, but it is still evaluating the exists column even when it is not referenced at all in the select. I would have expected the optimizer to remove it entirely.
Re: Avoiding duplication of code via views -- slower? How do people typically do this?
From
Tom Lane
Date:
Joe Van Dyk <joe@tanga.com> writes: > Perhaps I fat-fingered something somewhere... I tried that and I got this: > https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt Try without the useless "is true" bits. regards, tom lane
Re: Avoiding duplication of code via views -- slower? How do people typically do this?
From
Joe Van Dyk
Date:
On Fri, Feb 15, 2013 at 7:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Joe Van Dyk <joe@tanga.com> writes: > > Perhaps I fat-fingered something somewhere... I tried that and I got > this: > > > https://gist.github.com/joevandyk/4958906/raw/5561f95ef2b5d82f81ab14913c4d36f6aac3ee0a/gistfile1.txt > > Try without the useless "is true" bits. > > regards, tom lane > Huh, that did do the trick. Why does "is true" affect the plan? without "is true" in the conditions: -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=16676.66..16676.67 rows=1 width=0) (actual time= 95.648..95.648 rows=1 loops=1) -> Bitmap Heap Scan on promotions p (cost=868.37..16619.49 rows=22868 width=0) (actual time=11.031..95.294 rows=2720 loops=1) Recheck Cond: (end_at > (now() - '30 days'::interval)) Filter: ((quantity = 1) AND (SubPlan 1)) Rows Removed by Filter: 43073 -> Bitmap Index Scan on index_promotions_on_end_at (cost=0.00..862.65 rows=46093 width=0) (actual time=10.783..10.783 rows=73234 loops=1) Index Cond: (end_at > (now() - '30 days'::interval)) SubPlan 1 -> Index Only Scan using index_promotion_usages_on_promotion_id on promotion_usages pu (cost=0.00..20.54 rows=178 width=0) (actual time=0.001..0.001 rows=0 loops=44998) Index Cond: (promotion_id = p.id) Heap Fetches: 2720 Total runtime: 95.739 ms (12 rows) with "is true" in the conditions: QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=94430.93..94430.94 rows=1 width=0) (actual time=534.568..534.569 rows=1 loops=1) -> Seq Scan on promotions p (cost=0.00..94373.76 rows=22868 width=0) (actual time=0.306..534.165 rows=2720 loops=1) Filter: (((quantity = 1) IS TRUE) AND ((end_at > (now() - '30 days'::interval)) IS TRUE) AND ((SubPlan 1) IS TRUE)) Rows Removed by Filter: 600105 SubPlan 1 -> Index Only Scan using index_promotion_usages_on_promotion_id on promotion_usages pu (cost=0.00..20.54 rows=178 width=0) (actual time=0.001..0.001 rows=0 loops=44998) Index Cond: (promotion_id = p.id) Heap Fetches: 2720 Total runtime: 534.627 ms (9 rows)
Re: Avoiding duplication of code via views -- slower? How do people typically do this?
From
Tom Lane
Date:
Joe Van Dyk <joe@tanga.com> writes: > On Fri, Feb 15, 2013 at 7:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Try without the useless "is true" bits. > Huh, that did do the trick. Why does "is true" affect the plan? Because "(x = y) IS TRUE" isn't the same as "x = y". (The behavior for nulls is different.) And the planner only knows about using the latter type of condition for indexscans. Since you need it to convert the end_at condition into an indexscan to get a fast plan, you lose. It's conceivable that we could teach the planner about this case, but I haven't seen enough people doing that to make me think it's worth the code space and planner cycles. regards, tom lane
Re: Avoiding duplication of code via views -- slower? How do people typically do this?
From
Merlin Moncure
Date:
On Fri, Feb 15, 2013 at 1:42 AM, Chris Travers <chris.travers@gmail.com> wrote: > On Thu, Feb 14, 2013 at 4:32 PM, Joe Van Dyk <joe@tanga.com> wrote: >> >> See >> https://gist.github.com/joevandyk/4957646/raw/86d55472ff8b5a4a6740d9c673d18a7005738467/gistfile1.txt >> for the code. >> >> I have promotions(id, end_at, quantity) and >> promotion_usages(promotion_id). >> >> I have a couple of things I typically want to retrieve, and I'd like those >> things to be composable. In this case, finding recently-expired promotions, >> finding promotions that have a quantity of one, and finding promotions that >> were used. >> >> My approach is to put these conditions into views, then I can join against >> each one. But that approach is much slower than inlining all the code. >> >> How is this typically done? > > > First I am not usually a fan of trying to reduce code duplication by using > views. In general, my experience is that this makes it very easy to make > things slower, and it adds unexpected optimization hedges in unexpected > places. > > Your problem here seems to be of this sort. You are joining together two > views in order to add filters. These operations are not really guaranteed > to be the same and so you have an unexpected optimization fence. > > My general rule of thumb is to consider moving inline views and WITH clauses > into views as needed. > > Now I think there are a bunch of ways to accomplish what you are trying to > do here. > > At the risk of jumping straight ahead into advanced functionality and the > accusations that I am making use of magic wands, I will suggest an > object-relational approach to reducing code duplication. This would be to > eliminate most your filter views and make use instead of table methods. > > CREATE FUNCTION recently_expired(promotion) returns bool language sql > immutable as > $$ > select $1.ended at > now() - '30 days'::interval; > $$; > > CREATE FUNCTION is_one_time(promotion) returns bool language sql immutable > as > $$ > select $1.quantity = 1; > $$; Unfortunately from performance point of view that is a much worse way to do things. Pushing checks into function like that forces processing into a iterative model which has a much worse set of performance gotchas that have essentially no workaround. This is because there is no way to force the function to be inlined. What I'd like to see is to have a new function decoration, INLINE, that introduces some constraints to how the function can be written and forces the function to be expanded in the query at plan time. Only then will there be a true alternative to using views, especially if you could inline through a LATERAL function call. merlin.