Thread: Optimze usage of immutable functions as relation

Optimze usage of immutable functions as relation

From
Aleksandr Parfenov
Date:
Hi hackers,

There is a strange behavior of the query planner in some cases if
stable/immutable was used a relation. In some cases, it affects costs of
operations and leads to a bad plan of the execution. Oleg Bartunov 
noticed
such behavior in queries with a to_tsvector as a relation:

=# explain select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from 
messages, to_tsquery('tuple&header&overhead') q where body_tsvector @@ 
q;
                                         QUERY PLAN
------------------------------------------------------------------------------------------
  Nested Loop  (cost=383.37..58547.70 rows=4937 width=36)
    ->  Function Scan on to_tsquery q  (cost=0.25..0.26 rows=1 width=32)
    ->  Bitmap Heap Scan on messages  (cost=383.12..58461.04 rows=4937 
width=275)
          Recheck Cond: (body_tsvector @@ q.q)
          ->  Bitmap Index Scan on message_body_idx  (cost=0.00..381.89 
rows=4937 width=0)
                Index Cond: (body_tsvector @@ q.q)
(6 rows)

=# explain select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from 
messages, to_tsquery('tuple&header&overhead') q where body_tsvector @@
q limit 10;
                                    QUERY PLAN
--------------------------------------------------------------------------------
  Limit  (cost=0.25..425.62 rows=10 width=36)
    ->  Nested Loop  (cost=0.25..210005.80 rows=4937 width=36)
          Join Filter: (messages.body_tsvector @@ q.q)
          ->  Function Scan on to_tsquery q  (cost=0.25..0.26 rows=1 
width=32)
          ->  Seq Scan on messages  (cost=0.00..197625.45 rows=987445 
width=275)

The idea of the fix for this situation is to check is a result of the
function constant or not during the planning of the query. Attached 
patch does
this by processing Var entries at planner stage and replace them with
constant value if it is possible. Plans after applying a patch (SeqScan
query for comparison):

=# explain select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from 
messages, to_tsquery('tuple&header&overhead') q where body_tsvector @@
q limit 10;
                                           QUERY PLAN
----------------------------------------------------------------------------------------------
  Limit  (cost=224.66..268.11 rows=3 width=36)
    ->  Nested Loop  (cost=224.66..268.11 rows=3 width=36)
          ->  Function Scan on to_tsquery q  (cost=0.25..0.26 rows=1 
width=0)
          ->  Bitmap Heap Scan on messages  (cost=224.41..267.04 rows=3 
width=275)
                Recheck Cond: (body_tsvector @@ 
to_tsquery('tuple&header&overhead'::text))
                ->  Bitmap Index Scan on message_body_idx  
(cost=0.00..224.41 rows=3 width=0)
                      Index Cond: (body_tsvector @@ 
to_tsquery('tuple&header&overhead'::text))
(7 rows)

=# set enable_bitmapscan=off;
SET
=# explain select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from 
messages, to_tsquery('tuple&header&overhead') q where body_tsvector @@
q limit 10;
                                         QUERY PLAN
------------------------------------------------------------------------------------------
  Limit  (cost=1000.25..296754.14 rows=3 width=36)
    ->  Gather  (cost=1000.25..296754.14 rows=3 width=36)
          Workers Planned: 2
          ->  Nested Loop  (cost=0.25..295753.32 rows=1 width=36)
                ->  Parallel Seq Scan on messages
(cost=0.00..295752.80 rows=1 width=275)
                      Filter: (body_tsvector @@ 
to_tsquery('tuple&header&overhead'::text))
                ->  Function Scan on to_tsquery q  (cost=0.25..0.26 
rows=1 width=0)
(7 rows)

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: Optimze usage of immutable functions as relation

From
Andrew Gierth
Date:
>>>>> "Aleksandr" == Aleksandr Parfenov <a.parfenov@postgrespro.ru> writes:

 Aleksandr> The idea of the fix for this situation is to check is a
 Aleksandr> result of the function constant or not during the planning
 Aleksandr> of the query. Attached patch does this by processing Var
 Aleksandr> entries at planner stage and replace them with constant
 Aleksandr> value if it is possible. Plans after applying a patch
 Aleksandr> (SeqScan query for comparison):

From an implementation point of view your patch is obviously broken in
many ways (starting with not checking varattno anywhere, and not
actually checking anywhere if the expression is volatile).

But more importantly the plans that you showed seem _worse_ in that
you've created extra calls to to_tsquery even though the query has been
written to try and avoid that.

I think you need to take a step back and explain more precisely what you
think you're trying to do and what the benefit (and cost) is.

Specific coding style points (not exhaustive):

 Aleksandr>  pointedNode->functions->length == 1

list_length()

 Aleksandr> pointedNode->functions->head->data.ptr_value

linitial() or linitial_node()

 Aleksandr> if (result->type == T_FuncExpr)

IsA()

(what if the result is the inline expansion of a volatile function?)

 Aleksandr>     pfree(result);

result is a whole tree of nodes, pfree is pointless here

(don't bother trying to fix the above points in this patch, that won't
address the fundamental flaws)

-- 
Andrew (irc:RhodiumToad)


Re: Optimze usage of immutable functions as relation

From
Aleksandr Parfenov
Date:
Hello Andrew,

Thank you for the review of the patch.

On Fri, 04 May 2018 08:37:31 +0100
Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
> From an implementation point of view your patch is obviously broken in
> many ways (starting with not checking varattno anywhere, and not
> actually checking anywhere if the expression is volatile).

The actual checking if the expression volatile or not is done inside
evaluate_function(). This is called through few more function in
eval_const_experssion(). If it's volatile, the eval_const_expression()
will return FuncExpr node, Const otherwise. It also checks are arguments
immutable or not.

I agree about varattno, it should be checked. Even in case of SRF not
replaced, it is better to be sure that Var points to first (and the
only) attribute.

> But more importantly the plans that you showed seem _worse_ in that
> you've created extra calls to to_tsquery even though the query has
> been written to try and avoid that.
> 
> I think you need to take a step back and explain more precisely what
> you think you're trying to do and what the benefit (and cost) is.

Sure, the first version is some kind of prototype and attempt to
improve execution of the certain type of queries. The best solution I
see is to use the result of the function where it is required and remove
the nested loop in case of immutable functions. In this case, the
planner can choose a better plan if function result is used for tuples
selecting or as a join filter. But it will increase planning time due to
the execution of immutable functions.

It looks like I did something wrong with plans in the example, because
attached patch replaces function-relation usage with result value, not
function call. But nested loop is still in the plan.

I'm open to any suggestions/notices/critics about the idea and approach.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Re: Optimze usage of immutable functions as relation

From
Andrew Gierth
Date:
>>>>> "Aleksandr" == Aleksandr Parfenov <a.parfenov@postgrespro.ru> writes:

 >> From an implementation point of view your patch is obviously broken
 >> in many ways (starting with not checking varattno anywhere, and not
 >> actually checking anywhere if the expression is volatile).

 Aleksandr> The actual checking if the expression volatile or not is
 Aleksandr> done inside evaluate_function(). This is called through few
 Aleksandr> more function in eval_const_experssion(). If it's volatile,
 Aleksandr> the eval_const_expression() will return FuncExpr node, Const
 Aleksandr> otherwise. It also checks are arguments immutable or not.

You're missing a ton of other possible cases, of which by far the most
notable is function inlining: eval_const_expressions will inline even a
volatile function and return a new expression tree (which could be
almost anything depending on the function body).

 Aleksandr> I agree about varattno, it should be checked. Even in case
 Aleksandr> of SRF not replaced, it is better to be sure that Var points
 Aleksandr> to first (and the only) attribute.

It's not a matter of "better", but of basic correctness. Functions can
return composite values with columns.

-- 
Andrew (irc:RhodiumToad)


Re: Optimze usage of immutable functions as relation

From
Aleksandr Parfenov
Date:
Hello,

I reworked a patch to make more stable in different cases. I decided to
use simplify_function instead of eval_const_expression to prevent
inlining of the function. The only possible outputs of the
simplify_function are Const node and NULL.

Also, I block pre-evaluation of functions with types other than
TYPTYPE_BASE, cause there is no special logic for compound (and others)
values yet.

There is still a problem with memory leak in case of simplified
arguments. The only way I see is a creation of temporary memory context,
but it cost some performance. Maybe we can store simplified arguments
in the pointed function itself for later use. But eval_const_expression
and friends doesn't change the content of the nodes inside the tree, it
generates new nodes and returns it as a result.

The last point to mention is a fixed plan for the query in the initial
letter of the thread. As I mentioned before, new versions of the patch
replace var not with a function call, but with a function execution
result. After the patch, the following plan is used instead of Nested
Loop with Sequence Scan:

explain select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from messages, to_tsquery('english',
'tuple&header&overhead')q where body_tsvector @@ q limit 10;
 
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Limit  (cost=224.16..266.11 rows=3 width=36)
   ->  Nested Loop  (cost=224.16..266.11 rows=3 width=36)
         ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=0)
         ->  Bitmap Heap Scan on messages  (cost=224.16..266.04 rows=3 width=275)
               Recheck Cond: (body_tsvector @@ '''tupl'' & ''header'' & ''overhead'''::tsquery)
               ->  Bitmap Index Scan on message_body_idx  (cost=0.00..224.16 rows=3 width=0)
                     Index Cond: (body_tsvector @@ '''tupl'' & ''header'' & ''overhead'''::tsquery)

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: Optimze usage of immutable functions as relation

From
Heikki Linnakangas
Date:
On 16/05/18 13:47, Aleksandr Parfenov wrote:
> Hello,
> 
> I reworked a patch to make more stable in different cases. I decided to
> use simplify_function instead of eval_const_expression to prevent
> inlining of the function. The only possible outputs of the
> simplify_function are Const node and NULL.

Hmm. That's still a bit inefficient, we'll try to simplify the function 
on every reference to it.

We already simplify functions in function RTEs, but we currently do it 
after preprocessing all other expressions in the query. See 
subquery_planner(), around comment "/* Also need to preprocess 
expressions within RTEs */". If we moved that so that we simplify 
expressions in the range table first, then in the code that you're 
adding to eval_const_expression_mutator(), you could just check if the 
function expression is already a Const.


But stepping back a bit, it's a bit weird that we're handling this 
differently from VALUES and other subqueries. The planner knows how to 
do this trick for simple subqueries:

postgres=# explain select * from tenk1, (select abs(100)) as a (a) where 
unique1 < a;
                         QUERY PLAN
-----------------------------------------------------------
  Seq Scan on tenk1  (cost=0.00..483.00 rows=100 width=248)
    Filter: (unique1 < 100)
(2 rows)

Note that it not only evaluated the function into a constant, but also 
got rid of the join. For a function RTE, however, it can't do that:

postgres=# explain select * from tenk1, abs(100) as a (a) where unique1 < a;
                             QUERY PLAN
-------------------------------------------------------------------
  Nested Loop  (cost=0.00..583.01 rows=3333 width=248)
    Join Filter: (tenk1.unique1 < a.a)
    ->  Function Scan on a  (cost=0.00..0.01 rows=1 width=4)
    ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)
(4 rows)

Could we handle this in pull_up_subqueries(), similar to the 
RTE_SUBQUERY and RTE_VALUES cases?

- Heikki


Re: Optimze usage of immutable functions as relation

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> But stepping back a bit, it's a bit weird that we're handling this 
> differently from VALUES and other subqueries. The planner knows how to 
> do this trick for simple subqueries:

> postgres=# explain select * from tenk1, (select abs(100)) as a (a) where 
> unique1 < a;
>                          QUERY PLAN
> -----------------------------------------------------------
>   Seq Scan on tenk1  (cost=0.00..483.00 rows=100 width=248)
>     Filter: (unique1 < 100)
> (2 rows)

> Note that it not only evaluated the function into a constant, but also 
> got rid of the join. For a function RTE, however, it can't do that:

> postgres=# explain select * from tenk1, abs(100) as a (a) where unique1 < a;
>                              QUERY PLAN
> -------------------------------------------------------------------
>   Nested Loop  (cost=0.00..583.01 rows=3333 width=248)
>     Join Filter: (tenk1.unique1 < a.a)
>     ->  Function Scan on a  (cost=0.00..0.01 rows=1 width=4)
>     ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)
> (4 rows)

> Could we handle this in pull_up_subqueries(), similar to the 
> RTE_SUBQUERY and RTE_VALUES cases?

Perhaps.  You could only do it for non-set-returning functions, which
isn't the typical use of function RTEs, which is probably why we've not
thought hard about it before.  I'm not sure what would need to happen for
lateral functions.  Also to be considered, if it's not foldable to a
constant, is whether we're risking evaluating it more times than before.

            regards, tom lane


Re: Optimze usage of immutable functions as relation

From
Aleksandr Parfenov
Date:
On Tue, 10 Jul 2018 17:34:20 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

>Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> But stepping back a bit, it's a bit weird that we're handling this 
>> differently from VALUES and other subqueries. The planner knows how
>> to do this trick for simple subqueries:  
>
>> postgres=# explain select * from tenk1, (select abs(100)) as a (a)
>> where unique1 < a;
>>                          QUERY PLAN
>> -----------------------------------------------------------
>>   Seq Scan on tenk1  (cost=0.00..483.00 rows=100 width=248)
>>     Filter: (unique1 < 100)
>> (2 rows)  
>
>> Note that it not only evaluated the function into a constant, but
>> also got rid of the join. For a function RTE, however, it can't do
>> that:  
>
>> postgres=# explain select * from tenk1, abs(100) as a (a) where
>> unique1 < a; QUERY PLAN
>> -------------------------------------------------------------------
>>   Nested Loop  (cost=0.00..583.01 rows=3333 width=248)
>>     Join Filter: (tenk1.unique1 < a.a)  
>>     ->  Function Scan on a  (cost=0.00..0.01 rows=1 width=4)
>>     ->  Seq Scan on tenk1  (cost=0.00..458.00 rows=10000 width=244)  
>> (4 rows)  
>
>> Could we handle this in pull_up_subqueries(), similar to the 
>> RTE_SUBQUERY and RTE_VALUES cases?  
>
>Perhaps.  You could only do it for non-set-returning functions, which
>isn't the typical use of function RTEs, which is probably why we've not
>thought hard about it before.  I'm not sure what would need to happen
>for lateral functions.  Also to be considered, if it's not foldable to
>a constant, is whether we're risking evaluating it more times than
>before.
>
>            regards, tom lane

I reworked the patch and implemented processing of FuncScan in
pull_up_subqueries() in a way similar to VALUES processing. In order to
prevent folding of non-foldable functions it checks provolatile of the
function and are arguments const or not and return type to prevent
folding of SRF.

-- 
Aleksandr Parfenov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment

Re: Optimze usage of immutable functions as relation

From
Anthony Bykov
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

Hello, 
I was trying to review your patch, but I couldn't install it:

prepjointree.c: In function ‘pull_up_simple_function’:
prepjointree.c:1793:41: error: ‘functions’ undeclared (first use in this function); did you mean ‘PGFunction’?
  Assert(!contain_vars_of_level((Node *) functions, 0));

Was it a typo?

The new status of this patch is: Waiting on Author

Re: Optimze usage of immutable functions as relation

From
Aleksandr Parfenov
Date:
Hi,

Thank you for the review.
I fixed a typo and some comments. Please find new version attached.

---
Best regards,
Parfenov Aleksandr

On Fri, 19 Oct 2018 at 16:40, Anthony Bykov <a.bykov@postgrespro.ru> wrote:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            not tested

Hello,
I was trying to review your patch, but I couldn't install it:

prepjointree.c: In function ‘pull_up_simple_function’:
prepjointree.c:1793:41: error: ‘functions’ undeclared (first use in this function); did you mean ‘PGFunction’?
  Assert(!contain_vars_of_level((Node *) functions, 0));

Was it a typo?

The new status of this patch is: Waiting on Author
Attachment

Re: Optimze usage of immutable functions as relation

From
Kyotaro HORIGUCHI
Date:
Hello.

# It might be better that you provided self-contained test case.

As the discussion between Heikki and Tom just upthread, it would
be doable but that usage isn't typical. The query would be
normally written as followings and they are transformed as
desired.

select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from 
messages where body_tsvector @@ to_tsquery('tuple&header&overhead');

or (this doesn't look normal, thought..)

select '|'||subject||'|', ts_rank_cd(body_tsvector,q) from 
messages, (select to_tsquery('tuple&header&overhead') as q) q
where body_tsvector @@ q;

This means that the wanted pull up logic is almost already
there. You should look on how it is handled.


At Sat, 20 Oct 2018 01:31:04 +0700, Aleksandr Parfenov <asp437@gmail.com> wrote in
<CACdpekK1oDy7-_HnXOaREa_8HM2r-fsp8iv5e6p8aWOdGdK8Mg@mail.gmail.com>
> Hi,
> 
> Thank you for the review.
> I fixed a typo and some comments. Please find new version attached.

I had the following error with the following query.

=# explain select * from pg_stat_get_activity(NULL) a join log(100000.0) p on a.pid = p.p;
ERROR:  no relation entry for relid 2


As Andrew pointed, you are missing many things to do.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Optimze usage of immutable functions as relation

From
Antonin Houska
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:

> I had the following error with the following query.
>
> =# explain select * from pg_stat_get_activity(NULL) a join log(100000.0) p on a.pid = p.p;
> ERROR:  no relation entry for relid 2
>

I think that the problem is that RTE_VALUES is wrapped in a subquery by parser
while RTE_FUNCTION is not. Thus some additional processing is done by
pull_up_simple_subquery() for VALUES. What seems to make difference here is
the call of flatten_join_alias_vars() on the query targetlist, although
pull_up_simple_subquery() does it for other reasons.

Actually the comment about flatten_join_alias_vars() in
pull_up_simple_subquery() makes me think if it's o.k. that
pull_up_simple_function() sets rvcontext.need_phvs=false regardless strictness
of the function: I think PlaceHolderVar might be needed if the function is not
strict. (In contrast, pull_up_simple_values() does not have to care because
pull_up_simple_subquery() handles such cases when it's processing the owning
subquery).

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Re: Optimze usage of immutable functions as relation

From
Antonin Houska
Date:
Aleksandr Parfenov <asp437@gmail.com> wrote:

> I fixed a typo and some comments. Please find new version attached.

I've checked this verstion too.

* is_simple_stable_function()

instead of fetching HeapTuple from the syscache manually, you might want to
consider using functions from lsyscache.c (get_func_rettype, get_func_retset,
etc.), or adding a function that returns (subset of) the fields you need in a
single call.

* pull_up_simple_function():

As you assume that ret->functions is a single-item list

    Assert(list_length(rte->functions) == 1);

the following iteration is not necessary:

    foreach(lc, functions_list)

Also, there seems to be a lot of copy & paste from pull_up_simple_values(), so
some refactoring would make sense.


--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com


Re: Optimze usage of immutable functions as relation

From
Tom Lane
Date:
Aleksandr Parfenov <asp437@gmail.com> writes:
> [ funcscan_plan_optimizer_v4.patch ]

A small note here --- this would be noticeably cleaner if removal of
the no-longer-needed function RTE were done using the dummy-relation
infrastructure I proposed in

https://commitfest.postgresql.org/20/1827/

Maybe we should get that done first and then come back to this.

I'm a bit suspicious of the premise of this though, because it's
going to result in a single call of a function being converted
into possibly many calls, if the RTE is referenced in a lot of
places.  Even if the function is immutable so that those calls
get evaluated at plan time not run time, it's still N calls not
one, which'd be bad if the function is expensive.  See e.g.

https://www.postgresql.org/message-id/flat/CAOYf6edujEOGB-9fGG_V9PGQ5O9yoyWmTnE9RyBYTGw%2BDq3SpA%40mail.gmail.com

for an example where we're specifically telling people that
they can put a function into FROM to avoid multiple evaluation.
This patch breaks that guarantee.

A possible fix for this is to do eval_const_expressions() on
function RTE expressions at this stage (and then not need to
do it later), and then pull up only when we find that the
RTE expression has been reduced to a single Const.  I'm not
sure whether invoking eval_const_expressions() so early
would create any issues.  But if it can be made to work, this
would eliminate quite a lot of the ad-hoc conditions that
you have in the patch now.

            regards, tom lane


Re: Optimze usage of immutable functions as relation

From
Andres Freund
Date:
On 2018-11-08 15:08:03 +0100, Antonin Houska wrote:
> Aleksandr Parfenov <asp437@gmail.com> wrote:
> 
> > I fixed a typo and some comments. Please find new version attached.
> 
> I've checked this verstion too.
> 
> * is_simple_stable_function()
> 
> instead of fetching HeapTuple from the syscache manually, you might want to
> consider using functions from lsyscache.c (get_func_rettype, get_func_retset,
> etc.), or adding a function that returns (subset of) the fields you need in a
> single call.
> 
> * pull_up_simple_function():
> 
> As you assume that ret->functions is a single-item list
> 
>     Assert(list_length(rte->functions) == 1);
> 
> the following iteration is not necessary:
> 
>     foreach(lc, functions_list)
> 
> Also, there seems to be a lot of copy & paste from pull_up_simple_values(), so
> some refactoring would make sense.

Given this I think the appropriate state of the CF entry would have been
waiting-for-author, not needs review. Or alternatively
returned-with-feedback or rejected.  I'm a bit confused as to why the
patch was moved to the next CF twice?

- Andres


Re: Optimze usage of immutable functions as relation

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> Given this I think the appropriate state of the CF entry would have been
> waiting-for-author, not needs review. Or alternatively
> returned-with-feedback or rejected.  I'm a bit confused as to why the
> patch was moved to the next CF twice?

We have this review from Antonin, and mine in
https://www.postgresql.org/message-id/2906.1542395026%40sss.pgh.pa.us
and there's the cfbot report that the patch doesn't even apply anymore.

The dummy-relation stuff I referred to has now been merged, so there's
really no good reason not to revise the patch along that line.

I'm going to go set this CF entry to waiting-for-author, but unless
a rewritten patch appears soon, I think we should close it
returned-with-feedback.  I think the idea is potentially good, but
as I said in my review, I don't like this implementation; it has the
potential to be a net loss in some cases.

            regards, tom lane


Re: Optimze usage of immutable functions as relation

From
Alexander Kuzmenkov
Date:
On 2/18/19 03:20, Tom Lane wrote:
> The dummy-relation stuff I referred to has now been merged, so there's
> really no good reason not to revise the patch along that line.


I'll try to post the revised implementation soon.


-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Re: Optimze usage of immutable functions as relation

From
David Steele
Date:
On 2/28/19 4:27 PM, Alexander Kuzmenkov wrote:
> On 2/18/19 03:20, Tom Lane wrote:
>> The dummy-relation stuff I referred to has now been merged, so there's
>> really no good reason not to revise the patch along that line.
> 
> 
> I'll try to post the revised implementation soon.

I'll close this on March 8th if there is no new patch.

Regards,
-- 
-David
david@pgmasters.net


Re: Optimze usage of immutable functions as relation

From
Alexander Kuzmenkov
Date:
On 3/5/19 20:22, David Steele wrote:
>
> I'll close this on March 8th if there is no new patch.


This is taking longer than expected. I'll move it to the next commitfest 
and continue working on the patch.

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Optimze usage of immutable functions as relation

From
Alexander Kuzmenkov
Date:
On 11/16/18 22:03, Tom Lane wrote:
> A possible fix for this is to do eval_const_expressions() on
> function RTE expressions at this stage (and then not need to
> do it later), and then pull up only when we find that the
> RTE expression has been reduced to a single Const.


Attached is a patch that does this, and transforms RTE_FUCTION that was 
reduced to a single Const into an RTE_RESULT.

Not sure it does everything correctly, but some basic cases work. In 
particular, I don't understand whether it needs any handling of "append 
relations".


-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: Optimze usage of immutable functions as relation

From
Thomas Munro
Date:
On Thu, Mar 21, 2019 at 5:58 AM Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
> On 11/16/18 22:03, Tom Lane wrote:
> > A possible fix for this is to do eval_const_expressions() on
> > function RTE expressions at this stage (and then not need to
> > do it later), and then pull up only when we find that the
> > RTE expression has been reduced to a single Const.
>
>
> Attached is a patch that does this, and transforms RTE_FUCTION that was
> reduced to a single Const into an RTE_RESULT.

Hi Alexander,

The July Commitfest is here.  Could we please have a rebase of this patch?

Thanks,

-- 
Thomas Munro
https://enterprisedb.com



Re: Optimze usage of immutable functions as relation

From
Anastasia Lubennikova
Date:
08.07.2019 4:18, Thomas Munro:
> The July Commitfest is here.  Could we please have a rebase of this patch?
Updated patch is in attachments.
I've only resolved one small cosmetic merge conflict.

Later this week I'm going to send a more thoughtful review.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: Optimze usage of immutable functions as relation

From
Anastasia Lubennikova
Date:
23.07.2019 14:36, Anastasia Lubennikova :
> 08.07.2019 4:18, Thomas Munro:
>> The July Commitfest is here.  Could we please have a rebase of this 
>> patch?
> Updated patch is in attachments.
> I've only resolved one small cosmetic merge conflict.
>
> Later this week I'm going to send a more thoughtful review.
>
Well, I looked through the patch and didn't find any issues, so I'll 
mark this ready for committer.

Last implementation differs from the original one and is based on 
suggestion from this message:
https://www.postgresql.org/message-id/2906.1542395026%40sss.pgh.pa.us

It does eval_const_expressions() earlier and pulls up only Const functions.

I also added a few more tests and comments.
New version is in attachments.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: Optimze usage of immutable functions as relation

From
Tom Lane
Date:
Anastasia Lubennikova <a.lubennikova@postgrespro.ru> writes:
> New version is in attachments.

I took a quick look at this and I have a couple of gripes ---

* The naming and documentation of transform_const_function_to_result
seem pretty off-point to me.  ISTM the real goal of that function is to
pull up constant values from RTE_FUNCTION RTEs.  Replacing the RTE
with an RTE_RESULT is just a side-effect that's needed so that we
don't generate a useless FunctionScan plan node.  I'd probably call
it pull_up_constant_function or something like that.

* It would be useful for the commentary to point out that in principle we
could pull up any immutable (or, probably, even just stable) expression;
but we don't, (a) for fear of multiple evaluations of the result costing
us more than we can save, and (b) because a primary goal is to let the
constant participate in further const-folding, and of course that won't
happen for a non-Const.

* The test cases are really pretty bogus, because int4(1) or int4(0) are
not function invocations at all.  The parser thinks those are no-op type
casts, and so what comes out of parse analysis is already just a plain
Const.  Thus, not one of these test cases is actually verifying that
const-folding of an immutable function has happened before we try to
pull up.  While you could dodge the problem today by, say, writing
int8(0) which isn't a no-op cast, I'd recommend staying away from
typename() notation altogether.  There's too much baggage there and too
little certainty that you wrote a function call not something else.
The existing test cases you changed, with int4(sin(1)) and so on,
are better examples of something that has to actively be folded to
a constant.

            regards, tom lane



Re: Optimze usage of immutable functions as relation

From
Tom Lane
Date:
I wrote:
> * It would be useful for the commentary to point out that in principle we
> could pull up any immutable (or, probably, even just stable) expression;
> but we don't, (a) for fear of multiple evaluations of the result costing
> us more than we can save, and (b) because a primary goal is to let the
> constant participate in further const-folding, and of course that won't
> happen for a non-Const.

BTW, so far as I can see, none of the test cases demonstrate that such
further const-folding can happen.  Since this is all pretty processing-
order-sensitive, I think an explicit test that that's possible would
be a good idea.

            regards, tom lane



Re: Optimze usage of immutable functions as relation

From
Anastasia Lubennikova
Date:
26.07.2019 21:26, Tom Lane wrote:
> I took a quick look at this and I have a couple of gripes ---
>
> * The naming and documentation of transform_const_function_to_result
> seem pretty off-point to me.  ISTM the real goal of that function is to
> pull up constant values from RTE_FUNCTION RTEs.  Replacing the RTE
> with an RTE_RESULT is just a side-effect that's needed so that we
> don't generate a useless FunctionScan plan node.  I'd probably call
> it pull_up_constant_function or something like that.
>
> * It would be useful for the commentary to point out that in principle we
> could pull up any immutable (or, probably, even just stable) expression;
> but we don't, (a) for fear of multiple evaluations of the result costing
> us more than we can save, and (b) because a primary goal is to let the
> constant participate in further const-folding, and of course that won't
> happen for a non-Const.
Fixed
> * The test cases are really pretty bogus, because int4(1) or int4(0) are
> not function invocations at all.  The parser thinks those are no-op type
> casts, and so what comes out of parse analysis is already just a plain
> Const.  Thus, not one of these test cases is actually verifying that
> const-folding of an immutable function has happened before we try to
> pull up.  While you could dodge the problem today by, say, writing
> int8(0) which isn't a no-op cast, I'd recommend staying away from
> typename() notation altogether.  There's too much baggage there and too
> little certainty that you wrote a function call not something else.
> The existing test cases you changed, with int4(sin(1)) and so on,
> are better examples of something that has to actively be folded to
> a constant.

Thank you for pointing out on specific of int4() function,
I updated tests to use dummy plpgsql function.
I'm not sure if tests of various join types are redundant but I left them.

As far as I understand, the principal motivation of this patch was to 
optimize
function scan joins that occur in FTS queries. For example:
select * from test_tsquery, to_tsquery('english', 'new') q where 
txtsample @@ q;
So I also added another test to tsearch.sql to illustrate difference 
between optimized and not optimized plans.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: Optimze usage of immutable functions as relation

From
Tom Lane
Date:
Anastasia Lubennikova <a.lubennikova@postgrespro.ru> writes:
> Thank you for pointing out on specific of int4() function,
> I updated tests to use dummy plpgsql function.
> I'm not sure if tests of various join types are redundant but I left them.
> As far as I understand, the principal motivation of this patch was to 
> optimize
> function scan joins that occur in FTS queries. For example:
> select * from test_tsquery, to_tsquery('english', 'new') q where 
> txtsample @@ q;
> So I also added another test to tsearch.sql to illustrate difference 
> between optimized and not optimized plans.

Fair enough.

I've pushed this after a good deal of further hackery on the code.
Notably

* I had no faith that we still guaranteed to perform
eval_const_expressions on every function-RTE expression.  Even if
that were true today, it'd be awfully easy to break in future,
if it only happened somewhere down in the recursion in
pull_up_subqueries_recurse.  So I moved that responsibility to
an earlier spot that clearly traverses all function RTEs.
A happy side benefit is that inline_set_returning_function
gets simpler because it can assume that that already happened.

* It looked to me like the pullup_constant_function code wasn't
covering all the places where it might need to replace Vars
referencing the function RTE.  I refactored things to avoid
having a bunch of code duplication while ensuring it hits
everyplace that pull_up_simple_subquery does.

            regards, tom lane



Re: Optimze usage of immutable functions as relation

From
rmrodriguez@carto.com
Date:
Hi,

This commit is breaking some Postgis tests with custom types.

Here is a minimal repro (Postgis not required)

```
-- test custom types
create type t_custom_type AS (
valid bool,
reason varchar,
location varchar
);

create or replace function f_immutable_custom_type(i integer)
returns t_custom_type as
$$ declare oCustom t_custom_type;
begin
select into oCustom true as valid, 'OK' as reason, NULL as location;
return oCustom;
end; $$ language plpgsql immutable;

select valid, reason, location from f_immutable_custom_type(3);

drop function f_immutable_custom_type;
drop type t_custom_type;
```

Expected (PG12):
```
valid | reason | location
-------+--------+----------
 t     | OK     |
(1 row)
```

Instead with master/HEAD (eb57bd9c1d) we are getting:
```
ERROR:  could not find attribute 2 in subquery targetlist
```

Reverting 8613eda50488c27d848f8e8caa493c9d8e1b5271 fixes it,
but I haven't looked into the reason behind the bug.

Regards
--
Raúl Marín Rodríguez
carto.com



Re: Optimze usage of immutable functions as relation

From
Tom Lane
Date:
rmrodriguez@carto.com writes:
> This commit is breaking some Postgis tests with custom types.

Hm, yeah, the code fails to consider the possibility that the function
returns a composite type.

For the moment I'm just going to make it punt if the function result
class isn't TYPEFUNC_SCALAR.  In principle, if we have a composite
result, we could disassemble it into per-column constant values, but
I'm not sure it'd be worth the work.

            regards, tom lane