Thread: Does it make sense to break a large query into separate functions?
Greetings,
I have a domain specific language implementation that ends up as sql after an automatic transformation/code generation phase.
The complexity of the generated sql queries can be significant, with lots of self joins and many other criteria injected into sql.
I've just noticed that some parts of the autogenerated queries can be functions on their own. Would moving these parts to their own functions help the query planner?
There are any two tables and all queries use them, so I thought the query planner may be able to do better if I let it deal with smaller queries rather than one quite large one. I don't know how the query planner works when there are calls to other functions from a function though.
Changing the code generator would take time, so I decided to ask for more educated input than mine.
Best regards
Seref
Seref Arikan <serefarikan@kurumsalteknoloji.com> writes: > I've just noticed that some parts of the autogenerated queries can be > functions on their own. Would moving these parts to their own functions > help the query planner? It's difficult to answer that without a lot more detail than you've provided, but my offhand guess would be "no". Usually it's better when the planner can expand functions inline, which would just be reversing the transformation you're thinking of. regards, tom lane
Thanks Tom,
I am happy with a 10K feet level discussion of the approach, and your response is just what I was looking for.
Regards
Seref
On Wed, May 8, 2013 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:It's difficult to answer that without a lot more detail than you've
> I've just noticed that some parts of the autogenerated queries can be
> functions on their own. Would moving these parts to their own functions
> help the query planner?
provided, but my offhand guess would be "no". Usually it's better
when the planner can expand functions inline, which would just be
reversing the transformation you're thinking of.
regards, tom lane
On 08/05/13 23:40, Seref Arikan wrote: > Greetings, > > I have a domain specific language implementation that ends up as sql > after an automatic transformation/code generation phase. > The complexity of the generated sql queries can be significant, with > lots of self joins and many other criteria injected into sql. > > I've just noticed that some parts of the autogenerated queries can be > functions on their own. Would moving these parts to their own functions > help the query planner? > > There are any two tables and all queries use them, so I thought the > query planner may be able to do better if I let it deal with smaller > queries rather than one quite large one. I don't know how the query > planner works when there are calls to other functions from a function > though. > > Changing the code generator would take time, so I decided to ask for > more educated input than mine. > > Best regards > Seref > Hi Seref, The code generated sql queries isn't giving you much to work with (or a choice). However I suspect its doing its best dealing with this data structure (relationship model). I could be wrong. But that might be where the problem is. Regards, Julian.
On Wed, May 8, 2013 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Seref Arikan <serefarikan@kurumsalteknoloji.com> writes: >> I've just noticed that some parts of the autogenerated queries can be >> functions on their own. Would moving these parts to their own functions >> help the query planner? > > It's difficult to answer that without a lot more detail than you've > provided, but my offhand guess would be "no". Usually it's better > when the planner can expand functions inline, which would just be > reversing the transformation you're thinking of. In my experience virtually no useful functions are inlined by the planner. For example, with function: create function f(text) returns date as $$ select to_date($1, 'YYYY'); $$ language sql stable; /* immutable doesn't help */ I see about 4x time difference between: select to_date(v::text, 'YYYY') from generate_series(1,100000) v; and select f(v::text) from generate_series(1,100000) v; I'm curious if more aggressive inlining is a future performance optimization target for postgres or if there is some fundamental restriction that prevents such functions from being inlined. From an abstraction point of view, I'd really like to be able to push more code into functions, but often can't because of performance issues. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > In my experience virtually no useful functions are inlined by the > planner. For example, with function: > create function f(text) returns date as $$ > select to_date($1, 'YYYY'); $$ > language sql stable; /* immutable doesn't help */ > I see about 4x time difference between: > select to_date(v::text, 'YYYY') from generate_series(1,100000) v; > and > select f(v::text) from generate_series(1,100000) v; Something wrong with your measurement technique, because those expand to identical plan trees for me. regards, tom lane
On Wed, May 8, 2013 at 11:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> In my experience virtually no useful functions are inlined by the >> planner. For example, with function: > >> create function f(text) returns date as $$ >> select to_date($1, 'YYYY'); $$ >> language sql stable; /* immutable doesn't help */ > >> I see about 4x time difference between: >> select to_date(v::text, 'YYYY') from generate_series(1,100000) v; >> and >> select f(v::text) from generate_series(1,100000) v; > > Something wrong with your measurement technique, because those expand > to identical plan trees for me. you're right! interesting. I had left the function f() as 'immutable' (which is technically incorrect) before running performance test: postgres=# create or replace function f(text) returns date as $$ select to_date($1, 'YYYY'); $$ language sql stable; CREATE FUNCTION Time: 1.000 ms postgres=# explain analyze select f(v::text) from generate_series(1,100000) v; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Function Scan on generate_series v (cost=0.00..17.50 rows=1000 width=4) (actual time=12.949..110.804 rows=100000 loops=1) Total runtime: 167.938 ms (2 rows) Time: 169.017 ms postgres=# create or replace function f(text) returns date as $$ select to_date($1, 'YYYY'); $$ language sql immutable; CREATE FUNCTION Time: 2.000 ms postgres=# explain analyze select f(v::text) from generate_series(1,100000) v; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Function Scan on generate_series v (cost=0.00..265.00 rows=1000 width=4) (actual time=15.362..499.792 rows=100000 loops=1) Total runtime: 562.465 ms (2 rows) odd that stable function is inlined but immutable isn't! merlin
Merlin Moncure <mmoncure@gmail.com> writes: > odd that stable function is inlined but immutable isn't! Well, it knows that the expansion to to_date() would only be stable not immutable (because to_date depends on some GUC settings), so doing the expansion could change the behavior, eg by preventing constant-folding. Although usually wrapping a stable function in an immutable one is a recipe for disaster, we don't forbid it because there are cases where it makes sense --- for instance, you might know that the function really is immutable *in your usage*, and want to use it as an index function or some such. But the SQL-function wrapper adds a lot of overhead. I think a plpgsql wrapper would be better here, if you need to cheat about the mutability. regards, tom lane
On 09/05/13 02:47, Merlin Moncure wrote:
+100On Wed, May 8, 2013 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:I've just noticed that some parts of the autogenerated queries can be functions on their own. Would moving these parts to their own functions help the query planner?It's difficult to answer that without a lot more detail than you've provided, but my offhand guess would be "no". Usually it's better when the planner can expand functions inline, which would just be reversing the transformation you're thinking of.In my experience virtually no useful functions are inlined by the planner. For example, with function: create function f(text) returns date as $$ select to_date($1, 'YYYY'); $$ language sql stable; /* immutable doesn't help */ I see about 4x time difference between: select to_date(v::text, 'YYYY') from generate_series(1,100000) v; and select f(v::text) from generate_series(1,100000) v; I'm curious if more aggressive inlining is a future performance optimization target for postgres or if there is some fundamental restriction that prevents such functions from being inlined. From an abstraction point of view, I'd really like to be able to push more code into functions, but often can't because of performance issues. merlin
I would very much like to split SQL queries into more manageable parts without loss of performance. I have had SQL queries spanning over a page, and had a sequence of SQL queries with a lot in common. So if I could move the common bits out, it would have made it more maintainable. This was a few years ago in Sybase, but I would have had the same issues in Postgres.
I remember having the some complicated SQL queries with multiple sub selects, that had a lot of duplication within the same query, which I would have like to have removed. Common table expressions (the WITH clause) may have helped, but not that much.
However, common table expressions would have eliminated the need for some temporary tables, but made for some much longer SQL queries. This was in a stored procedure that was over 3,000 lines long - in SyBase, but I keep thinking how I would have done it in Postgres (I knew of Postgres, but did not have the option to use it).
Cheers,
Gavin
On Wed, May 8, 2013 at 2:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> odd that stable function is inlined but immutable isn't! > > Well, it knows that the expansion to to_date() would only be stable not > immutable (because to_date depends on some GUC settings), so doing the > expansion could change the behavior, eg by preventing constant-folding. I see your point-- but you have to admin it's just plain weird -- in this example the behavior is in fact immutable and marking it as such causes it to not be inlined. For purposes of inlining, regardless of the implementation, IMO the function decoration should trump forensic analysis of the function body. Translation: immutable and stable functions should *always* be inlined. More oddness -- when I wrap, say, random() with stable function, I get unique value per returned row, but same value across the set when wrapped with immutable. > Although usually wrapping a stable function in an immutable one is a > recipe for disaster, we don't forbid it because there are cases where it > makes sense --- for instance, you might know that the function really is > immutable *in your usage*, and want to use it as an index function or > some such. But the SQL-function wrapper adds a lot of overhead. I > think a plpgsql wrapper would be better here, if you need to cheat about > the mutability. Right. In this case, plpgsql is only about 10% faster than non-inlined sql. inlined sql completely smokes both of them. Regardless, this is a scratch example off of the top of my head. I'm curious if there's a good reference for inlining rules and if their limits have been well explored (and if so, so be it). What I ultimately want is a way to abstract code without using views, dynamic sql, etc. merlin
On 09/05/13 09:35, Merlin Moncure wrote: [...] > More oddness -- when I wrap, say, random() with stable function, I get > unique value per returned row, but same value across the set when > wrapped with immutable. [..] That part I think I know (but, I'm often wrong!). By saying it is immutable, you are saying that the values returned for the same set of parameters is always the same. The system looks at your immutable function that wraps random() and 'knows' that once invoked, the value returned will always be the same, so why would it want to invoke your immutable function multiple times, as it can just do that once and reuse the value returned? Cheers, Gavin
On Wed, May 8, 2013 at 4:47 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote: > On 09/05/13 09:35, Merlin Moncure wrote: > [...] > > >> More oddness -- when I wrap, say, random() with stable function, I get >> unique value per returned row, but same value across the set when wrapped >> with immutable. > > [..] > > That part I think I know (but, I'm often wrong!). > > By saying it is immutable, you are saying that the values returned for the > same set of parameters is always the same. The system looks at your > immutable function that wraps random() and 'knows' that once invoked, the > value returned will always be the same, so why would it want to invoke your > immutable function multiple times, as it can just do that once and reuse the > value returned? right. note I think that (reduced invocation of immutable function) is the correct behavior. If I say something is immutable, rightly or wrongly, I am giving postgres a free hand to reduce function calls. But (wandering off topic here), But I'm wondering why the stable wrapper doesn't exhibit the same behavior. In other words I'm pedantically chasing down the in-query planning differences between immutable and stable -- I don't think there should be any (but I certainly could be wrong about that). *) shouldn't a stable and immutable functions have the same semantics from query planning type of view? To me, immutable conveys all the guarantees of 'stable', plus the additional guarantee of function stability beyond the current transaction. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > right. note I think that (reduced invocation of immutable function) is > the correct behavior. If I say something is immutable, rightly or > wrongly, I am giving postgres a free hand to reduce function calls. > But (wandering off topic here), But I'm wondering why the stable > wrapper doesn't exhibit the same behavior. The reason it won't inline such a wrapper is that if it expands foo(x) to to_date(x, 'YYYY'), the latter expression will be considered stable, not immutable; so for example any subsequent opportunity to constant-fold it would not be taken. It's conceivable that we could somehow decorate the parsetree so that the function call node for to_date() would still be considered immutable because it came out of inlining an immutable function. But that's not how things work now, and I'm dubious that it'd be worth the trouble. Really the right answer here is to label the wrapper with the same mutability level as its contents. regards, tom lane
On Thu, May 9, 2013 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Merlin Moncure <mmoncure@gmail.com> writes: >> right. note I think that (reduced invocation of immutable function) is >> the correct behavior. If I say something is immutable, rightly or >> wrongly, I am giving postgres a free hand to reduce function calls. >> But (wandering off topic here), But I'm wondering why the stable >> wrapper doesn't exhibit the same behavior. > > The reason it won't inline such a wrapper is that if it expands foo(x) > to to_date(x, 'YYYY'), the latter expression will be considered stable, > not immutable; so for example any subsequent opportunity to > constant-fold it would not be taken. But the stable decorated function *is* inlined. I read your reasoning a couple of times above why decorating the function immutable (as opposed to stable) defeats inlining, but I'm having trouble parsing it. It's not clear to me why stable and immutable functions are treated differently at all at plan time (regardless of decoration, assuming we are not talking about volatile functions as non-volatile). > It's conceivable that we could somehow decorate the parsetree so that > the function call node for to_date() would still be considered immutable > because it came out of inlining an immutable function. But that's not > how things work now, and I'm dubious that it'd be worth the trouble. > Really the right answer here is to label the wrapper with the same > mutability level as its contents. That's a fair point, but i'd state that to_date/YYYY *is* immutable in this context and abstracting the functionality into a immutable function seems reasonable (say, so that you could use the same function in a functional index). Having to keep an alternate version (marked stable) in order to preserve inlining seems ... awkward. merlin
Merlin Moncure <mmoncure@gmail.com> writes: > On Thu, May 9, 2013 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The reason it won't inline such a wrapper is that if it expands foo(x) >> to to_date(x, 'YYYY'), the latter expression will be considered stable, >> not immutable; so for example any subsequent opportunity to >> constant-fold it would not be taken. > But the stable decorated function *is* inlined. Right, because then the perceived volatility of the expression isn't increasing. > I read your reasoning > a couple of times above why decorating the function immutable (as > opposed to stable) defeats inlining, but I'm having trouble parsing > it. It's not clear to me why stable and immutable functions are > treated differently at all at plan time (regardless of decoration, > assuming we are not talking about volatile functions as non-volatile). foo('1923') will be folded to a constant at plan time. to_date('1923','YYYY') will not be. That's the difference so far as the planner is concerned. You can verify this with EXPLAIN VERBOSE. The rule about not increasing the volatility of an expression is mainly meant to prevent inlining a stable/immutable function that actually contains a volatile-labeled function, because doing that would have significantly greater consequences, eg not being able to use the expression in an indexqual. But it's coded as "don't increase the volatility at all". I'm not sure whether there would be severe consequences if we allowed the immutable-to-stable case. It's at least possible that we'd end up missing some constant-folding opportunities, depending on the order of operations in the planner. regards, tom lane