Thread: Does it make sense to break a large query into separate functions?

Does it make sense to break a large query into separate functions?

From
Seref Arikan
Date:
Greetings, 

I have a domain specific language implementation that ends up as sql after an automatic transformation/code generation phase. 
The complexity of the generated sql queries can be significant, with lots of self joins and many other criteria injected into sql. 

I've just noticed that some parts of the autogenerated queries can be functions on their own. Would moving these parts to their own functions help the query planner? 

There are any two tables and all queries use them, so I thought the query planner may be able to do better if I let it deal with smaller queries rather than one quite large one. I don't know how the query planner works when there are calls to other functions from a function though.

Changing the code generator would take time, so I decided to ask for more educated input than mine.

Best regards
Seref

Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:
> I've just noticed that some parts of the autogenerated queries can be
> functions on their own. Would moving these parts to their own functions
> help the query planner?

It's difficult to answer that without a lot more detail than you've
provided, but my offhand guess would be "no".  Usually it's better
when the planner can expand functions inline, which would just be
reversing the transformation you're thinking of.

            regards, tom lane


Re: Does it make sense to break a large query into separate functions?

From
Seref Arikan
Date:
Thanks Tom, 
I am happy with a 10K feet level discussion of the approach, and your response is just what I was looking for. 

Regards
Seref



On Wed, May 8, 2013 at 3:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:
> I've just noticed that some parts of the autogenerated queries can be
> functions on their own. Would moving these parts to their own functions
> help the query planner?

It's difficult to answer that without a lot more detail than you've
provided, but my offhand guess would be "no".  Usually it's better
when the planner can expand functions inline, which would just be
reversing the transformation you're thinking of.

                        regards, tom lane

On 08/05/13 23:40, Seref Arikan wrote:
> Greetings,
>
> I have a domain specific language implementation that ends up as sql
> after an automatic transformation/code generation phase.
> The complexity of the generated sql queries can be significant, with
> lots of self joins and many other criteria injected into sql.
>
> I've just noticed that some parts of the autogenerated queries can be
> functions on their own. Would moving these parts to their own functions
> help the query planner?
>
> There are any two tables and all queries use them, so I thought the
> query planner may be able to do better if I let it deal with smaller
> queries rather than one quite large one. I don't know how the query
> planner works when there are calls to other functions from a function
> though.
>
> Changing the code generator would take time, so I decided to ask for
> more educated input than mine.
>
> Best regards
> Seref
>
Hi Seref,
The code generated sql queries isn't giving you much to work with (or a
choice). However I suspect its doing its best dealing with this data
structure (relationship model). I could be wrong.

But that might be where the problem is.

Regards,
Julian.


Re: Does it make sense to break a large query into separate functions?

From
Merlin Moncure
Date:
On Wed, May 8, 2013 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:
>> I've just noticed that some parts of the autogenerated queries can be
>> functions on their own. Would moving these parts to their own functions
>> help the query planner?
>
> It's difficult to answer that without a lot more detail than you've
> provided, but my offhand guess would be "no".  Usually it's better
> when the planner can expand functions inline, which would just be
> reversing the transformation you're thinking of.

In my experience virtually no useful functions are inlined by the
planner.   For example, with function:

create function f(text) returns date as $$
  select to_date($1, 'YYYY'); $$
language sql stable;  /* immutable doesn't help */

I see about 4x time difference between:
select to_date(v::text, 'YYYY') from generate_series(1,100000) v;
and
select f(v::text) from generate_series(1,100000) v;

I'm curious if more aggressive inlining is a future performance
optimization target for postgres or if there is some fundamental
restriction that prevents such functions from being inlined.  From an
abstraction point of view, I'd really like to be able to push more
code into functions, but often can't because of performance issues.

merlin


Merlin Moncure <mmoncure@gmail.com> writes:
> In my experience virtually no useful functions are inlined by the
> planner.   For example, with function:

> create function f(text) returns date as $$
>   select to_date($1, 'YYYY'); $$
> language sql stable;  /* immutable doesn't help */

> I see about 4x time difference between:
> select to_date(v::text, 'YYYY') from generate_series(1,100000) v;
> and
> select f(v::text) from generate_series(1,100000) v;

Something wrong with your measurement technique, because those expand
to identical plan trees for me.

            regards, tom lane


Re: Does it make sense to break a large query into separate functions?

From
Merlin Moncure
Date:
On Wed, May 8, 2013 at 11:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> In my experience virtually no useful functions are inlined by the
>> planner.   For example, with function:
>
>> create function f(text) returns date as $$
>>   select to_date($1, 'YYYY'); $$
>> language sql stable;  /* immutable doesn't help */
>
>> I see about 4x time difference between:
>> select to_date(v::text, 'YYYY') from generate_series(1,100000) v;
>> and
>> select f(v::text) from generate_series(1,100000) v;
>
> Something wrong with your measurement technique, because those expand
> to identical plan trees for me.

you're right! interesting.  I had left the function f() as 'immutable'
(which is technically incorrect) before running performance test:

postgres=# create or replace function f(text) returns date as $$
  select to_date($1, 'YYYY'); $$
language sql stable;
CREATE FUNCTION
Time: 1.000 ms
postgres=# explain analyze select f(v::text) from generate_series(1,100000) v;
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series v  (cost=0.00..17.50 rows=1000
width=4) (actual time=12.949..110.804 rows=100000 loops=1)
 Total runtime: 167.938 ms
(2 rows)

Time: 169.017 ms
postgres=# create or replace function f(text) returns date as $$
  select to_date($1, 'YYYY'); $$
language sql immutable;
CREATE FUNCTION
Time: 2.000 ms
postgres=# explain analyze select f(v::text) from generate_series(1,100000) v;
                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Function Scan on generate_series v  (cost=0.00..265.00 rows=1000
width=4) (actual time=15.362..499.792 rows=100000 loops=1)
 Total runtime: 562.465 ms
(2 rows)


odd that stable function is inlined but immutable isn't!

merlin


Merlin Moncure <mmoncure@gmail.com> writes:
> odd that stable function is inlined but immutable isn't!

Well, it knows that the expansion to to_date() would only be stable not
immutable (because to_date depends on some GUC settings), so doing the
expansion could change the behavior, eg by preventing constant-folding.

Although usually wrapping a stable function in an immutable one is a
recipe for disaster, we don't forbid it because there are cases where it
makes sense --- for instance, you might know that the function really is
immutable *in your usage*, and want to use it as an index function or
some such.  But the SQL-function wrapper adds a lot of overhead.  I
think a plpgsql wrapper would be better here, if you need to cheat about
the mutability.

            regards, tom lane


Re: Does it make sense to break a large query into separate functions?

From
Gavin Flower
Date:
On 09/05/13 02:47, Merlin Moncure wrote:
On Wed, May 8, 2013 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Seref Arikan <serefarikan@kurumsalteknoloji.com> writes:
I've just noticed that some parts of the autogenerated queries can be
functions on their own. Would moving these parts to their own functions
help the query planner?
It's difficult to answer that without a lot more detail than you've
provided, but my offhand guess would be "no".  Usually it's better
when the planner can expand functions inline, which would just be
reversing the transformation you're thinking of.
In my experience virtually no useful functions are inlined by the
planner.   For example, with function:

create function f(text) returns date as $$ select to_date($1, 'YYYY'); $$
language sql stable;  /* immutable doesn't help */

I see about 4x time difference between:
select to_date(v::text, 'YYYY') from generate_series(1,100000) v;
and
select f(v::text) from generate_series(1,100000) v;

I'm curious if more aggressive inlining is a future performance
optimization target for postgres or if there is some fundamental
restriction that prevents such functions from being inlined.  From an
abstraction point of view, I'd really like to be able to push more
code into functions, but often can't because of performance issues.

merlin


+100

I would very much like to split SQL queries into more manageable parts without loss of performance.  I have had SQL queries spanning over a page, and had a sequence of
SQL queries with a lot in common.  So if I could move the common bits out, it would have made it more maintainable.  This was a few years ago in Sybase, but I would have had the same issues in Postgres.

I remember having the some complicated
SQL queries with multiple sub selects, that had a lot of duplication within the same query, which I would have like to have removed. Common table expressions (the WITH clause) may have helped, but not that much.

However,
common table expressions would have eliminated the need for some temporary tables, but made for some much longer SQL queries.  This was in a stored procedure that was over 3,000 lines long - in SyBase, but I keep thinking how I would have done it in Postgres (I knew of Postgres, but did not have the option to use it).


Cheers,
Gavin

Re: Does it make sense to break a large query into separate functions?

From
Merlin Moncure
Date:
On Wed, May 8, 2013 at 2:04 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> odd that stable function is inlined but immutable isn't!
>
> Well, it knows that the expansion to to_date() would only be stable not
> immutable (because to_date depends on some GUC settings), so doing the
> expansion could change the behavior, eg by preventing constant-folding.

I see your point-- but you have to admin it's just plain weird -- in
this example the behavior is in fact immutable and marking it as such
causes it to not be inlined.  For purposes of inlining, regardless of
the implementation, IMO the function decoration should trump forensic
analysis of the function body.  Translation: immutable and stable
functions should *always* be inlined.

More oddness -- when I wrap, say, random() with stable function, I get
unique value per returned row, but same value across the set when
wrapped with immutable.

> Although usually wrapping a stable function in an immutable one is a
> recipe for disaster, we don't forbid it because there are cases where it
> makes sense --- for instance, you might know that the function really is
> immutable *in your usage*, and want to use it as an index function or
> some such.  But the SQL-function wrapper adds a lot of overhead.  I
> think a plpgsql wrapper would be better here, if you need to cheat about
> the mutability.

Right.  In this case, plpgsql is only about 10% faster than
non-inlined sql.  inlined sql completely smokes both of them.
Regardless, this is a scratch example off of the top of my head.  I'm
curious if there's a good reference for inlining rules and if their
limits have been well explored (and if so, so be it).  What I
ultimately want is a way to abstract code without using views, dynamic
sql, etc.

merlin


Re: Does it make sense to break a large query into separate functions?

From
Gavin Flower
Date:
On 09/05/13 09:35, Merlin Moncure wrote:
[...]

> More oddness -- when I wrap, say, random() with stable function, I get
> unique value per returned row, but same value across the set when
> wrapped with immutable.
[..]

That part I think I know (but, I'm often wrong!).

By saying it is immutable, you are saying that the values returned for
the same set of parameters is always the same.  The system looks at your
immutable function that wraps random() and 'knows' that once invoked,
the value returned will always be the same, so why would it want to
invoke your immutable function multiple times, as it can just do that
once and reuse the value returned?


Cheers,
Gavin


Re: Does it make sense to break a large query into separate functions?

From
Merlin Moncure
Date:
On Wed, May 8, 2013 at 4:47 PM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
> On 09/05/13 09:35, Merlin Moncure wrote:
> [...]
>
>
>> More oddness -- when I wrap, say, random() with stable function, I get
>> unique value per returned row, but same value across the set when wrapped
>> with immutable.
>
> [..]
>
> That part I think I know (but, I'm often wrong!).
>
> By saying it is immutable, you are saying that the values returned for the
> same set of parameters is always the same.  The system looks at your
> immutable function that wraps random() and 'knows' that once invoked, the
> value returned will always be the same, so why would it want to invoke your
> immutable function multiple times, as it can just do that once and reuse the
> value returned?

right. note I think that (reduced invocation of immutable function) is
the correct behavior.    If I say something is immutable, rightly or
wrongly, I am giving postgres a free hand to reduce function calls.
But (wandering off topic here),  But I'm wondering why the stable
wrapper doesn't exhibit the same behavior.   In other words I'm
pedantically chasing down the in-query planning differences between
immutable and stable -- I don't think there should be any (but I
certainly could be wrong about that).

*) shouldn't a stable and immutable functions have the same semantics
from query planning type of view?  To me, immutable conveys all the
guarantees of 'stable', plus the additional guarantee of function
stability beyond the current transaction.

merlin


Merlin Moncure <mmoncure@gmail.com> writes:
> right. note I think that (reduced invocation of immutable function) is
> the correct behavior.    If I say something is immutable, rightly or
> wrongly, I am giving postgres a free hand to reduce function calls.
> But (wandering off topic here),  But I'm wondering why the stable
> wrapper doesn't exhibit the same behavior.

The reason it won't inline such a wrapper is that if it expands foo(x)
to to_date(x, 'YYYY'), the latter expression will be considered stable,
not immutable; so for example any subsequent opportunity to
constant-fold it would not be taken.

It's conceivable that we could somehow decorate the parsetree so that
the function call node for to_date() would still be considered immutable
because it came out of inlining an immutable function.  But that's not
how things work now, and I'm dubious that it'd be worth the trouble.
Really the right answer here is to label the wrapper with the same
mutability level as its contents.

            regards, tom lane


Re: Does it make sense to break a large query into separate functions?

From
Merlin Moncure
Date:
On Thu, May 9, 2013 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Merlin Moncure <mmoncure@gmail.com> writes:
>> right. note I think that (reduced invocation of immutable function) is
>> the correct behavior.    If I say something is immutable, rightly or
>> wrongly, I am giving postgres a free hand to reduce function calls.
>> But (wandering off topic here),  But I'm wondering why the stable
>> wrapper doesn't exhibit the same behavior.
>
> The reason it won't inline such a wrapper is that if it expands foo(x)
> to to_date(x, 'YYYY'), the latter expression will be considered stable,
> not immutable; so for example any subsequent opportunity to
> constant-fold it would not be taken.

But the stable decorated function *is* inlined. I read your reasoning
a couple of times above why decorating the function immutable (as
opposed to stable) defeats inlining, but I'm having trouble parsing
it.  It's not clear to me why stable and immutable functions are
treated differently at all at plan time (regardless of decoration,
assuming we are not talking about volatile functions as non-volatile).

> It's conceivable that we could somehow decorate the parsetree so that
> the function call node for to_date() would still be considered immutable
> because it came out of inlining an immutable function.  But that's not
> how things work now, and I'm dubious that it'd be worth the trouble.
> Really the right answer here is to label the wrapper with the same
> mutability level as its contents.

That's a fair point, but i'd state that to_date/YYYY *is* immutable in
this context and abstracting the functionality into a immutable
function seems reasonable (say, so that you could use the same
function in a functional index).  Having to keep an alternate version
(marked stable) in order to preserve inlining seems ... awkward.

merlin


Merlin Moncure <mmoncure@gmail.com> writes:
> On Thu, May 9, 2013 at 10:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The reason it won't inline such a wrapper is that if it expands foo(x)
>> to to_date(x, 'YYYY'), the latter expression will be considered stable,
>> not immutable; so for example any subsequent opportunity to
>> constant-fold it would not be taken.

> But the stable decorated function *is* inlined.

Right, because then the perceived volatility of the expression isn't
increasing.

> I read your reasoning
> a couple of times above why decorating the function immutable (as
> opposed to stable) defeats inlining, but I'm having trouble parsing
> it.  It's not clear to me why stable and immutable functions are
> treated differently at all at plan time (regardless of decoration,
> assuming we are not talking about volatile functions as non-volatile).

foo('1923') will be folded to a constant at plan time.
to_date('1923','YYYY') will not be.  That's the difference so far
as the planner is concerned.  You can verify this with EXPLAIN VERBOSE.

The rule about not increasing the volatility of an expression is mainly
meant to prevent inlining a stable/immutable function that actually
contains a volatile-labeled function, because doing that would have
significantly greater consequences, eg not being able to use the
expression in an indexqual.  But it's coded as "don't increase the
volatility at all".  I'm not sure whether there would be severe
consequences if we allowed the immutable-to-stable case.  It's at
least possible that we'd end up missing some constant-folding
opportunities, depending on the order of operations in the planner.

            regards, tom lane