Thread: SQLFunctionCache and generic plans

SQLFunctionCache and generic plans

From
Ronan Dunklau
Date:
Hello,

It has been brought to my attention that SQL functions always use generic 
plans.

Take this function for example:

create or replace function test_plpgsql(p1 oid) returns text as $$
BEGIN
   RETURN (SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 1);   
END;
$$ language plpgsql;

As expected, the PlanCache takes care of generating parameter specific plans, 
and correctly prunes the redundant OR depending on wether we call the function 
with a NULL value or not:

ro=# select test_plpgsql(NULL);
LOG:  duration: 0.030 ms  plan:
Query Text: (SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 
1)
Result  (cost=0.04..0.05 rows=1 width=64)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.00..0.04 rows=1 width=64)
          ->  Seq Scan on pg_class  (cost=0.00..18.12 rows=412 width=64)
LOG:  duration: 0.662 ms  plan:
Query Text: select test_plpgsql(NULL);
Result  (cost=0.00..0.26 rows=1 width=32)

ro=# select test_plpgsql(1);
LOG:  duration: 0.075 ms  plan:
Query Text: (SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 
1)
Result  (cost=8.29..8.30 rows=1 width=64)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.27..8.29 rows=1 width=64)
          ->  Index Scan using pg_class_oid_index on pg_class  
(cost=0.27..8.29 rows=1 width=64)
                Index Cond: (oid = '1'::oid)
LOG:  duration: 0.675 ms  plan:
Query Text: select test_plpgsql(1);
Result  (cost=0.00..0.26 rows=1 width=32)


But writing the same function in SQL:
create or replace function test_sql(p1 oid) returns text as $$
SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 1
$$ language sql;

we end up with a generic plan:

ro=# select test_sql(1);
LOG:  duration: 0.287 ms  plan:
Query Text:  SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 1
Query Parameters: $1 = '1'
Limit  (cost=0.00..6.39 rows=1 width=32)
  ->  Seq Scan on pg_class  (cost=0.00..19.16 rows=3 width=32)
        Filter: ((oid = $1) OR ($1 IS NULL))

This is due to the fact that SQL functions are planned once for the whole 
query using a specific SQLFunctionCache instead of using the whole PlanCache 
machinery. 

The following comment can be found in functions.c, about the SQLFunctionCache:

 * Note that currently this has only the lifespan of the calling query.
 * Someday we should rewrite this code to use plancache.c to save parse/plan
 * results for longer than that.

I would be interested in working on this, primarily to avoid this problem of 
having generic query plans for SQL functions but maybe having a longer lived 
cache as well would be nice to have.

Is there any reason not too, or pitfalls we would like to avoid ?

Best regards,

--
Ronan Dunklau






Re: SQLFunctionCache and generic plans

From
Tom Lane
Date:
Ronan Dunklau <ronan.dunklau@aiven.io> writes:
> The following comment can be found in functions.c, about the SQLFunctionCache:

>  * Note that currently this has only the lifespan of the calling query.
>  * Someday we should rewrite this code to use plancache.c to save parse/plan
>  * results for longer than that.

> I would be interested in working on this, primarily to avoid this problem of
> having generic query plans for SQL functions but maybe having a longer lived
> cache as well would be nice to have.
> Is there any reason not too, or pitfalls we would like to avoid ?

AFAIR it's just lack of round tuits.  There would probably be some
semantic side-effects, though if you pay attention you could likely
make things better while you are at it.  The existing behavior of
parsing and planning all the statements at once is not very desirable
--- for instance, it doesn't work to do
    CREATE TABLE foo AS ...;
    SELECT * FROM foo;
I think if we're going to nuke this code and start over, we should
try to make that sort of case work.

            regards, tom lane



Re: SQLFunctionCache and generic plans

From
Alexander Korotkov
Date:
Hi, Alexander!

On Tue, Sep 3, 2024 at 10:33 AM Alexander Pyhalov
<a.pyhalov@postgrespro.ru> wrote:
> Tom Lane писал(а) 2023-02-07 18:29:
> > Ronan Dunklau <ronan.dunklau@aiven.io> writes:
> >> The following comment can be found in functions.c, about the
> >> SQLFunctionCache:
> >
> >>  * Note that currently this has only the lifespan of the calling
> >> query.
> >>  * Someday we should rewrite this code to use plancache.c to save
> >> parse/plan
> >>  * results for longer than that.
> >
> >> I would be interested in working on this, primarily to avoid this
> >> problem of
> >> having generic query plans for SQL functions but maybe having a longer
> >> lived
> >> cache as well would be nice to have.
> >> Is there any reason not too, or pitfalls we would like to avoid ?
> >
> > AFAIR it's just lack of round tuits.  There would probably be some
> > semantic side-effects, though if you pay attention you could likely
> > make things better while you are at it.  The existing behavior of
> > parsing and planning all the statements at once is not very desirable
> > --- for instance, it doesn't work to do
> >       CREATE TABLE foo AS ...;
> >       SELECT * FROM foo;
> > I think if we're going to nuke this code and start over, we should
> > try to make that sort of case work.
> >
> >                       regards, tom lane
>
> Hi.
>
> I've tried to make SQL functions use CachedPlan machinery. The main goal
> was to allow SQL functions to use custom plans
> (the work was started from question - why sql function is so slow
> compared to plpgsql one). It turned out that
> plpgsql function used custom plan and eliminated scan of all irrelevant
> sections, but
> exec-time pruning didn't cope with pruning when ScalarArrayOpExpr,
> filtering data using int[] parameter.
>
> In current prototype there are two restrictions. The first one is that
> CachecPlan has lifetime of a query - it's not
> saved for future use, as we don't have something like plpgsql hashtable
> for long live function storage. Second -
> SQL language functions in sql_body form (with stored queryTree_list) are
> handled in the old way, as we currently lack
> tools to make cached plans from query trees.
>
> Currently this change solves the issue of inefficient plans for queries
> over partitioned tables. For example, function like
>
> CREATE OR REPLACE FUNCTION public.test_get_records(ids integer[])
>   RETURNS SETOF test
>   LANGUAGE sql
> AS $function$
>      select *
>                   from test
>                   where id = any (ids)
> $function$;
>
> for hash-distributed table test can perform pruning in plan time  and
> can have plan like
>
>     Append  (cost=0.00..51.88 rows=26 width=36)
>            ->  Seq Scan on test_0 test_1  (cost=0.00..25.88 rows=13
> width=36)
>                  Filter: (id = ANY ('{1,2}'::integer[]))
>            ->  Seq Scan on test_2  (cost=0.00..25.88 rows=13 width=36)
>                  Filter: (id = ANY ('{1,2}'::integer[]))
>
> instead of
>
> Append  (cost=0.00..155.54 rows=248 width=36)
>            ->  Seq Scan on test_0 test_1  (cost=0.00..38.58 rows=62
> width=36)
>                  Filter: (id = ANY ($1))
>            ->  Seq Scan on test_1 test_2  (cost=0.00..38.58 rows=62
> width=36)
>                  Filter: (id = ANY ($1))
>            ->  Seq Scan on test_2 test_3  (cost=0.00..38.58 rows=62
> width=36)
>                  Filter: (id = ANY ($1))
>            ->  Seq Scan on test_3 test_4  (cost=0.00..38.58 rows=62
> width=36)
>                  Filter: (id = ANY ($1))
>
> This patch definitely requires more work, and I share it to get some
> early feedback.
>
> What should we do with "pre-parsed" SQL functions (when prosrc is
> empty)? How should we create cached plans when we don't have raw
> parsetrees?
> Currently we can create cached plans without raw parsetrees, but this
> means that plan revalidation doesn't work, choose_custom_plan()
> always returns false and we get generic plan. Perhaps, we need some form
> of GetCachedPlan(), which ignores raw_parse_tree?

I don't think you need a new form of GetCachedPlan().  Instead, it
seems that StmtPlanRequiresRevalidation() should be revised.  As I got
from comments and the d8b2fcc9d4 commit message, the primary goal was
to skip revalidation of utility statements.  Skipping revalidation was
a positive side effect, as long as we didn't support custom plans for
them anyway.  But as you're going to change this,
StmtPlanRequiresRevalidation() needs to be revised.

I also think it's not necessary to implement long-lived plan cache in
the initial patch.  The work could be split into two patches.  The
first could implement query lifetime plan cache.  This is beneficial
already by itself as you've shown by example.  The second could
implement long-lived plan cache.

I appreciate your work in this direction.  I hope you got the feedback
to go ahead and work on remaining issues.

------
Regards,
Alexander Korotkov
Supabase