Thread: SQLFunctionCache and generic plans

SQLFunctionCache and generic plans

From
Ronan Dunklau
Date:
Hello,

It has been brought to my attention that SQL functions always use generic 
plans.

Take this function for example:

create or replace function test_plpgsql(p1 oid) returns text as $$
BEGIN
   RETURN (SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 1);   
END;
$$ language plpgsql;

As expected, the PlanCache takes care of generating parameter specific plans, 
and correctly prunes the redundant OR depending on wether we call the function 
with a NULL value or not:

ro=# select test_plpgsql(NULL);
LOG:  duration: 0.030 ms  plan:
Query Text: (SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 
1)
Result  (cost=0.04..0.05 rows=1 width=64)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.00..0.04 rows=1 width=64)
          ->  Seq Scan on pg_class  (cost=0.00..18.12 rows=412 width=64)
LOG:  duration: 0.662 ms  plan:
Query Text: select test_plpgsql(NULL);
Result  (cost=0.00..0.26 rows=1 width=32)

ro=# select test_plpgsql(1);
LOG:  duration: 0.075 ms  plan:
Query Text: (SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 
1)
Result  (cost=8.29..8.30 rows=1 width=64)
  InitPlan 1 (returns $0)
    ->  Limit  (cost=0.27..8.29 rows=1 width=64)
          ->  Index Scan using pg_class_oid_index on pg_class  
(cost=0.27..8.29 rows=1 width=64)
                Index Cond: (oid = '1'::oid)
LOG:  duration: 0.675 ms  plan:
Query Text: select test_plpgsql(1);
Result  (cost=0.00..0.26 rows=1 width=32)


But writing the same function in SQL:
create or replace function test_sql(p1 oid) returns text as $$
SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 1
$$ language sql;

we end up with a generic plan:

ro=# select test_sql(1);
LOG:  duration: 0.287 ms  plan:
Query Text:  SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 1
Query Parameters: $1 = '1'
Limit  (cost=0.00..6.39 rows=1 width=32)
  ->  Seq Scan on pg_class  (cost=0.00..19.16 rows=3 width=32)
        Filter: ((oid = $1) OR ($1 IS NULL))

This is due to the fact that SQL functions are planned once for the whole 
query using a specific SQLFunctionCache instead of using the whole PlanCache 
machinery. 

The following comment can be found in functions.c, about the SQLFunctionCache:

 * Note that currently this has only the lifespan of the calling query.
 * Someday we should rewrite this code to use plancache.c to save parse/plan
 * results for longer than that.

I would be interested in working on this, primarily to avoid this problem of 
having generic query plans for SQL functions but maybe having a longer lived 
cache as well would be nice to have.

Is there any reason not too, or pitfalls we would like to avoid ?

Best regards,

--
Ronan Dunklau






Re: SQLFunctionCache and generic plans

From
Tom Lane
Date:
Ronan Dunklau <ronan.dunklau@aiven.io> writes:
> The following comment can be found in functions.c, about the SQLFunctionCache:

>  * Note that currently this has only the lifespan of the calling query.
>  * Someday we should rewrite this code to use plancache.c to save parse/plan
>  * results for longer than that.

> I would be interested in working on this, primarily to avoid this problem of
> having generic query plans for SQL functions but maybe having a longer lived
> cache as well would be nice to have.
> Is there any reason not too, or pitfalls we would like to avoid ?

AFAIR it's just lack of round tuits.  There would probably be some
semantic side-effects, though if you pay attention you could likely
make things better while you are at it.  The existing behavior of
parsing and planning all the statements at once is not very desirable
--- for instance, it doesn't work to do
    CREATE TABLE foo AS ...;
    SELECT * FROM foo;
I think if we're going to nuke this code and start over, we should
try to make that sort of case work.

            regards, tom lane



Re: SQLFunctionCache and generic plans

From
Alexander Korotkov
Date:
Hi, Alexander!

On Tue, Sep 3, 2024 at 10:33 AM Alexander Pyhalov
<a.pyhalov@postgrespro.ru> wrote:
> Tom Lane писал(а) 2023-02-07 18:29:
> > Ronan Dunklau <ronan.dunklau@aiven.io> writes:
> >> The following comment can be found in functions.c, about the
> >> SQLFunctionCache:
> >
> >>  * Note that currently this has only the lifespan of the calling
> >> query.
> >>  * Someday we should rewrite this code to use plancache.c to save
> >> parse/plan
> >>  * results for longer than that.
> >
> >> I would be interested in working on this, primarily to avoid this
> >> problem of
> >> having generic query plans for SQL functions but maybe having a longer
> >> lived
> >> cache as well would be nice to have.
> >> Is there any reason not too, or pitfalls we would like to avoid ?
> >
> > AFAIR it's just lack of round tuits.  There would probably be some
> > semantic side-effects, though if you pay attention you could likely
> > make things better while you are at it.  The existing behavior of
> > parsing and planning all the statements at once is not very desirable
> > --- for instance, it doesn't work to do
> >       CREATE TABLE foo AS ...;
> >       SELECT * FROM foo;
> > I think if we're going to nuke this code and start over, we should
> > try to make that sort of case work.
> >
> >                       regards, tom lane
>
> Hi.
>
> I've tried to make SQL functions use CachedPlan machinery. The main goal
> was to allow SQL functions to use custom plans
> (the work was started from question - why sql function is so slow
> compared to plpgsql one). It turned out that
> plpgsql function used custom plan and eliminated scan of all irrelevant
> sections, but
> exec-time pruning didn't cope with pruning when ScalarArrayOpExpr,
> filtering data using int[] parameter.
>
> In current prototype there are two restrictions. The first one is that
> CachecPlan has lifetime of a query - it's not
> saved for future use, as we don't have something like plpgsql hashtable
> for long live function storage. Second -
> SQL language functions in sql_body form (with stored queryTree_list) are
> handled in the old way, as we currently lack
> tools to make cached plans from query trees.
>
> Currently this change solves the issue of inefficient plans for queries
> over partitioned tables. For example, function like
>
> CREATE OR REPLACE FUNCTION public.test_get_records(ids integer[])
>   RETURNS SETOF test
>   LANGUAGE sql
> AS $function$
>      select *
>                   from test
>                   where id = any (ids)
> $function$;
>
> for hash-distributed table test can perform pruning in plan time  and
> can have plan like
>
>     Append  (cost=0.00..51.88 rows=26 width=36)
>            ->  Seq Scan on test_0 test_1  (cost=0.00..25.88 rows=13
> width=36)
>                  Filter: (id = ANY ('{1,2}'::integer[]))
>            ->  Seq Scan on test_2  (cost=0.00..25.88 rows=13 width=36)
>                  Filter: (id = ANY ('{1,2}'::integer[]))
>
> instead of
>
> Append  (cost=0.00..155.54 rows=248 width=36)
>            ->  Seq Scan on test_0 test_1  (cost=0.00..38.58 rows=62
> width=36)
>                  Filter: (id = ANY ($1))
>            ->  Seq Scan on test_1 test_2  (cost=0.00..38.58 rows=62
> width=36)
>                  Filter: (id = ANY ($1))
>            ->  Seq Scan on test_2 test_3  (cost=0.00..38.58 rows=62
> width=36)
>                  Filter: (id = ANY ($1))
>            ->  Seq Scan on test_3 test_4  (cost=0.00..38.58 rows=62
> width=36)
>                  Filter: (id = ANY ($1))
>
> This patch definitely requires more work, and I share it to get some
> early feedback.
>
> What should we do with "pre-parsed" SQL functions (when prosrc is
> empty)? How should we create cached plans when we don't have raw
> parsetrees?
> Currently we can create cached plans without raw parsetrees, but this
> means that plan revalidation doesn't work, choose_custom_plan()
> always returns false and we get generic plan. Perhaps, we need some form
> of GetCachedPlan(), which ignores raw_parse_tree?

I don't think you need a new form of GetCachedPlan().  Instead, it
seems that StmtPlanRequiresRevalidation() should be revised.  As I got
from comments and the d8b2fcc9d4 commit message, the primary goal was
to skip revalidation of utility statements.  Skipping revalidation was
a positive side effect, as long as we didn't support custom plans for
them anyway.  But as you're going to change this,
StmtPlanRequiresRevalidation() needs to be revised.

I also think it's not necessary to implement long-lived plan cache in
the initial patch.  The work could be split into two patches.  The
first could implement query lifetime plan cache.  This is beneficial
already by itself as you've shown by example.  The second could
implement long-lived plan cache.

I appreciate your work in this direction.  I hope you got the feedback
to go ahead and work on remaining issues.

------
Regards,
Alexander Korotkov
Supabase



Re: SQLFunctionCache and generic plans

From
Pavel Stehule
Date:
Hi

út 31. 12. 2024 v 16:36 odesílatel Alexander Pyhalov <a.pyhalov@postgrespro.ru> napsal:
Hi.

>> What should we do with "pre-parsed" SQL functions (when prosrc is
>> empty)? How should we create cached plans when we don't have raw
>> parsetrees?
>> Currently we can create cached plans without raw parsetrees, but this
>> means that plan revalidation doesn't work, choose_custom_plan()
>> always returns false and we get generic plan. Perhaps, we need some
>> form
>> of GetCachedPlan(), which ignores raw_parse_tree?
>
> I don't think you need a new form of GetCachedPlan().  Instead, it
> seems that StmtPlanRequiresRevalidation() should be revised.  As I got
> from comments and the d8b2fcc9d4 commit message, the primary goal was
> to skip revalidation of utility statements.  Skipping revalidation was
> a positive side effect, as long as we didn't support custom plans for
> them anyway.  But as you're going to change this,
> StmtPlanRequiresRevalidation() needs to be revised.
>

Thanks for feedback.

I've modifed StmtPlanRequiresRevalidation() so that it looks on queries
command type.
Not sure if it's enough or I have to recreate something more similar to
stmt_requires_parse_analysis()
logic. I was wondering if this can lead to triggering plan revalidation
in RevalidateCachedQuery().
I suppose not - as we plan in executor (so shouldn't catch userid change
or see some changes in
related objects. Revalidation would kill our plan, destroying
resultDesc.

Also while looking at this, fixed processing of instead of rules (which
would lead to NULL execution_state).
--

there are lot of fails found by tester

Please, can you check it?

regards

Pavel
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: SQLFunctionCache and generic plans

From
Pavel Stehule
Date:
Hi

čt 30. 1. 2025 v 9:50 odesílatel Alexander Pyhalov <a.pyhalov@postgrespro.ru> napsal:
Alexander Pyhalov писал(а) 2025-01-29 17:35:
> Tom Lane писал(а) 2025-01-17 21:27:
>> Alexander Pyhalov <a.pyhalov@postgrespro.ru> writes:
>>> I've rebased patch on master. Tests pass here.
>>
>> The cfbot still doesn't like it; my guess is that you built without
>> --with-libxml and so didn't notice the effects on xml.out.
>
> Hi. Thank you for review.
>
> I've updated patch.

Sorry, missed one local patch to fix memory bloat during replaning. Also
fixed a compiler warning.

Did you do some performance checks?

I tried some worst case

CREATE OR REPLACE FUNCTION fx(int)
RETURNS int AS $$
SELECT $1 + $1
$$ LANGUAGE SQL IMMUTABLE;

CREATE OR REPLACE FUNCTION fx2(int)
RETURNS int AS $$
SELECT $1 * 2
$$ LANGUAGE SQL IMMUTABLE;

do $$
begin
  for i in 1..1000000 loop
    perform fx(i); -- or fx2
  end loop;
end;
$$;
DO

The patched version reduces the difference between execution fx and fx2, but patched version is about 10% slower than unpatched.

The overhead of plan cache looks significant for simple cases (and a lot of SQL functions are very simple).

Regards

Pavel

 

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: SQLFunctionCache and generic plans

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> Did you do some performance checks?

This is a good question to ask ...

> I tried some worst case

> CREATE OR REPLACE FUNCTION fx(int)
> RETURNS int AS $$
> SELECT $1 + $1
> $$ LANGUAGE SQL IMMUTABLE;

... but I don't think tests like this will give helpful answers.
That function is simple enough to be inlined:

regression=# explain verbose select fx(f1) from int4_tbl;
                          QUERY PLAN                           
---------------------------------------------------------------
 Seq Scan on public.int4_tbl  (cost=0.00..1.06 rows=5 width=4)
   Output: (f1 + f1)
(2 rows)

So functions.c shouldn't have any involvement at all in the
actually-executed PERFORM expression, and whatever difference
you measured must have been noise.  (If the effect *is* real,
we'd better find out why.)

You need to test with a non-inline-able function.  Looking
at the inlining conditions in inline_function(), one simple
hack is to make the function return SETOF.  That'll only
exercise the returns-set path in functions.c though, so it'd
be advisable to check other inline-blocking conditions too.

            regards, tom lane



Re: SQLFunctionCache and generic plans

From
Pavel Stehule
Date:
Hi

I did multiple benchmarking, and still looks so the proposed patch doesn't help and has significant overhead

testcase:

create or replace function fx(int) returns int as $$ select $1 + $1; $$ language sql immutable;
create or replace function fx2(int) returns int as $$ select 2 * $1; $$ language sql immutable;

I tested

do $$              
begin
  for i in 1..1000000 loop
    perform fx((random()*100)::int); -- or fx2
  end loop;
end;
$$;

Results (master, patched):
fx: 17067 ms, 22165 ms
fx2: 2234 ms, 2311 ms

the execution of dynamic sql

2025-02-03 18:47:33) postgres=# do $$
begin
  for i in 1..1000000 loop
    execute 'select $1 + $1' using (random()*100)::int;
  end loop;
end;
$$;
DO
Time: 13412.990 ms (00:13.413)

In the profiler I see a significant overhead of the parser, so it looks like there is some more (overhead), but plan cache is not used.

Please, can somebody recheck my tests?

Regards

Pavel




Re: SQLFunctionCache and generic plans

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I did multiple benchmarking, and still looks so the proposed patch doesn't
> help and has significant overhead

Yeah, your fx() test case is clearly worse.  For me,

HEAD:

regression=# do $$
begin
  for i in 1..1000000 loop
    perform fx((random()*100)::int); -- or fx2
  end loop;
end;
$$;
DO
Time: 5229.184 ms (00:05.229)

PATCH:

regression=# do $$
begin
  for i in 1..1000000 loop
    perform fx((random()*100)::int); -- or fx2
  end loop;
end;
$$;
DO
Time: 6934.413 ms (00:06.934)

Adding some debug printout shows me that BuildCachedPlan is called to
construct a custom plan on every single execution, which is presumably
because the patch doesn't make any attempt to carry plancache state
across successive executions of the same query.  If we were saving that
state it would have soon switched to a generic plan and then won big.

So, even though I thought we could leave that for later, it seems like
maybe we have to have it before we'll have a committable patch.

There might be some residual inefficiency in there though.  In the
unpatched code we'd be calling pg_parse_query and pg_plan_query once
per execution.  You'd think that would cost more than BuildCachedPlan,
which can skip the raw-parsing part.  Even more interesting, the
patch gets slower yet if we use a new-style SQL function:

regression=# create or replace function fx3 (int) returns int immutable
regression-# begin atomic select $1 + $1; end;
CREATE FUNCTION
Time: 0.813 ms
regression=# do $$                                                     
begin                            
  for i in 1..1000000 loop
    perform fx3((random()*100)::int); -- or fx2
  end loop;
end;
$$;
DO
Time: 8007.062 ms (00:08.007)

That makes no sense either, because with a new-style SQL
function we should be skipping parse analysis as well.

But wait: HEAD takes
Time: 6632.709 ms (00:06.633)
to do the same thing.  So somehow the new-style SQL function
stuff is very materially slower in this use-case, with or
without this patch.  I do not understand why.

Definitely some performance investigation needs to be done here.
Even without cross-query plan caching, I don't see why the
patch isn't better than it is.  It ought to be at least
competitive with the unpatched code.

(I've not read the v5 patch yet, so I have no theories.)

            regards, tom lane



Re: SQLFunctionCache and generic plans

From
Tom Lane
Date:
I wrote:
> But wait: HEAD takes
> Time: 6632.709 ms (00:06.633)
> to do the same thing.  So somehow the new-style SQL function
> stuff is very materially slower in this use-case, with or
> without this patch.  I do not understand why.

"perf" tells me that in the fx3 test, a full third of the runtime
is spent inside stringToNode(), with about three-quarters of that
going into pg_strtok().  This makes no sense to me: we'll be reading
the prosqlbody of fx3(), sure, but that's not enormously long (about
1200 bytes).  And pg_strtok() doesn't look remarkably slow.  There's
no way this should be taking more time than raw parsing + parse
analysis, even for such a trivial query as "select $1 + $1".

There's been some talk of getting rid of our existing nodetree
storage format in favor of something more efficient.  Maybe we
should put a higher priority on getting that done.  But anyway,
that seems orthogonal to the current patch.

> Even without cross-query plan caching, I don't see why the
> patch isn't better than it is.  It ought to be at least
> competitive with the unpatched code.

This remains true.

            regards, tom lane



Re: SQLFunctionCache and generic plans

From
Pavel Stehule
Date:
hI

I can confirm 60% speedup for execution of function fx and fx3 - both functions are very primitive, so for real code the benefit can be higher

Unfortunately, there is about 5% slowdown for inlined code, and for just plpgsql code too.

I tested fx4

create or replace function fx4(int) returns int immutable as $$ begin return $1 + $1; end $$ language plpgsql;

and fx2

create or replace function fx2(int) returns int as $$ select 2 * $1; $$
language sql immutable;

and execution of patched code is about 5% slower. It is strange so this patch has a negative impact on plpgsql execution.

Regards

Pavel

Re: SQLFunctionCache and generic plans

From
Alexander Pyhalov
Date:
Pavel Stehule писал(а) 2025-02-26 22:34:
> hI
> 
> I can confirm 60% speedup for execution of function fx and fx3 - both
> functions are very primitive, so for real code the benefit can be
> higher
> 
> Unfortunately, there is about 5% slowdown for inlined code, and for
> just plpgsql code too.
> 
> I tested fx4
> 
> create or replace function fx4(int) returns int immutable as $$ begin
> return $1 + $1; end $$ language plpgsql;
> 
> and fx2
> 
> create or replace function fx2(int) returns int as $$ select 2 * $1;
> $$
> language sql immutable;
> 
> and execution of patched code is about 5% slower. It is strange so
> this patch has a negative impact on plpgsql execution.
> 
> Regards
> 
> Pavel

Hi. I've tried to reproduce slowdown and couldn't.

create or replace function fx4(int) returns int immutable as $$ begin 
return $1 + $1; end $$ language plpgsql;

do $$
begin
   for i in 1..5000000 loop
     perform fx4((random()*100)::int); -- or fx2
   end loop;
end;
$$;

OLD results:
Time: 8268.614 ms (00:08.269)
Time: 8178.836 ms (00:08.179)
Time: 8306.694 ms (00:08.307)

New (patched) results:
Time: 7743.945 ms (00:07.744)
Time: 7803.109 ms (00:07.803)
Time: 7736.735 ms (00:07.737)

Not sure why new is faster (perhaps, some noise?) - looking at perf 
flamegraphs I don't see something evident.

create or replace function fx2(int) returns int as $$ select 2 * $1; $$ 
language sql immutable;
do $$
begin
   for i in 1..5000000 loop
     perform fx2((random()*100)::int); -- or fx2
   end loop;
end;
$$;

OLD results:
Time: 5346.471 ms (00:05.346)
Time: 5359.222 ms (00:05.359)
Time: 5316.747 ms (00:05.317)

New (patched) results:
Time: 5188.363 ms (00:05.188)
Time: 5225.322 ms (00:05.225)
Time: 5203.667 ms (00:05.204)

-- 
Best regards,
Alexander Pyhalov,
Postgres Professional



Re: SQLFunctionCache and generic plans

From
Pavel Stehule
Date:


čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov <a.pyhalov@postgrespro.ru> napsal:
Pavel Stehule писал(а) 2025-02-26 22:34:
> hI
>
> I can confirm 60% speedup for execution of function fx and fx3 - both
> functions are very primitive, so for real code the benefit can be
> higher
>
> Unfortunately, there is about 5% slowdown for inlined code, and for
> just plpgsql code too.
>
> I tested fx4
>
> create or replace function fx4(int) returns int immutable as $$ begin
> return $1 + $1; end $$ language plpgsql;
>
> and fx2
>
> create or replace function fx2(int) returns int as $$ select 2 * $1;
> $$
> language sql immutable;
>
> and execution of patched code is about 5% slower. It is strange so
> this patch has a negative impact on plpgsql execution.
>
> Regards
>
> Pavel

Hi. I've tried to reproduce slowdown and couldn't.

create or replace function fx4(int) returns int immutable as $$ begin
return $1 + $1; end $$ language plpgsql;

do $$
begin
   for i in 1..5000000 loop
     perform fx4((random()*100)::int); -- or fx2
   end loop;
end;
$$;

OLD results:
Time: 8268.614 ms (00:08.269)
Time: 8178.836 ms (00:08.179)
Time: 8306.694 ms (00:08.307)

New (patched) results:
Time: 7743.945 ms (00:07.744)
Time: 7803.109 ms (00:07.803)
Time: 7736.735 ms (00:07.737)

Not sure why new is faster (perhaps, some noise?) - looking at perf
flamegraphs I don't see something evident.

create or replace function fx2(int) returns int as $$ select 2 * $1; $$
language sql immutable;
do $$
begin
   for i in 1..5000000 loop
     perform fx2((random()*100)::int); -- or fx2
   end loop;
end;
$$;

OLD results:
Time: 5346.471 ms (00:05.346)
Time: 5359.222 ms (00:05.359)
Time: 5316.747 ms (00:05.317)

New (patched) results:
Time: 5188.363 ms (00:05.188)
Time: 5225.322 ms (00:05.225)
Time: 5203.667 ms (00:05.204)

I'll try to get profiles.

Regards

Pavel

--
Best regards,
Alexander Pyhalov,
Postgres Professional

Re: SQLFunctionCache and generic plans

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov <
> a.pyhalov@postgrespro.ru> napsal:
>>> Unfortunately, there is about 5% slowdown for inlined code, and for
>>> just plpgsql code too.

>> Hi. I've tried to reproduce slowdown and couldn't.

> I'll try to get profiles.

I tried to reproduce this too.  What I got on my usual development
workstation (RHEL8/gcc 8.5.0 on x86_64) was:

fx2 example: v6 patch about 2.4% slower than HEAD
fx4 example: v6 patch about 7.3% slower than HEAD

I was quite concerned after that result, but then I tried it on
another machine (macOS/clang 16.0.0 on Apple M1) and got:

fx2 example: v6 patch about 0.2% slower than HEAD
fx4 example: v6 patch about 0.7% faster than HEAD

(These are average-of-three-runs tests on --disable-cassert
builds; I trust you guys were not doing performance tests on
assert-enabled builds?)

So taken together, our results are all over the map, anywhere
from 7% speedup to 7% slowdown.  My usual rule of thumb is that
you can see up to 2% variation in this kind of microbenchmark even
when "nothing has changed", just due to random build details like
whether critical loops cross a cacheline or not.  7% is pretty
well above that threshold, but maybe it's just random build
variation anyway.

Furthermore, since neither example involves functions.c at all
(fx2 would be inlined, and fx4 isn't SQL-language), it's hard
to see how the patch would directly affect either example unless
it were adding overhead to plancache.c.  And I don't see any
changes there that would amount to meaningful overhead for the
existing use-case with a raw parse tree.

So right at the moment I'm inclined to write this off as
measurement noise.  Perhaps it'd be worth checking a few
more platforms, though.

            regards, tom lane



Re: SQLFunctionCache and generic plans

From
Pavel Stehule
Date:


čt 27. 2. 2025 v 20:52 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov <
> a.pyhalov@postgrespro.ru> napsal:
>>> Unfortunately, there is about 5% slowdown for inlined code, and for
>>> just plpgsql code too.

>> Hi. I've tried to reproduce slowdown and couldn't.

> I'll try to get profiles.

I tried to reproduce this too.  What I got on my usual development
workstation (RHEL8/gcc 8.5.0 on x86_64) was:

fx2 example: v6 patch about 2.4% slower than HEAD
fx4 example: v6 patch about 7.3% slower than HEAD

I was quite concerned after that result, but then I tried it on
another machine (macOS/clang 16.0.0 on Apple M1) and got:

fx2 example: v6 patch about 0.2% slower than HEAD
fx4 example: v6 patch about 0.7% faster than HEAD

(These are average-of-three-runs tests on --disable-cassert
builds; I trust you guys were not doing performance tests on
assert-enabled builds?)

So taken together, our results are all over the map, anywhere
from 7% speedup to 7% slowdown.  My usual rule of thumb is that

Where do you see 7% speedup? Few lines up you wrote 0.7% faster.


you can see up to 2% variation in this kind of microbenchmark even
when "nothing has changed", just due to random build details like
whether critical loops cross a cacheline or not.  7% is pretty
well above that threshold, but maybe it's just random build
variation anyway.

Furthermore, since neither example involves functions.c at all
(fx2 would be inlined, and fx4 isn't SQL-language), it's hard
to see how the patch would directly affect either example unless
it were adding overhead to plancache.c.  And I don't see any
changes there that would amount to meaningful overhead for the
existing use-case with a raw parse tree.

So right at the moment I'm inclined to write this off as
measurement noise.  Perhaps it'd be worth checking a few
more platforms, though.

                        regards, tom lane

Re: SQLFunctionCache and generic plans

From
Tom Lane
Date:
Alexander Pyhalov <a.pyhalov@postgrespro.ru> writes:
> Now sql functions plans are actually saved. The most of it is a 
> simplified version of plpgsql plan cache. Perhaps, I've missed 
> something.

A couple of thoughts about v6:

* I don't think it's okay to just summarily do this:

            /* It's stale; unlink and delete */
            fcinfo->flinfo->fn_extra = NULL;
            MemoryContextDelete(fcache->fcontext);
            fcache = NULL;

when fmgr_sql sees that the cache is stale.  If we're
doing a nested call of a recursive SQL function, this'd be
cutting the legs out from under the outer recursion level.
plpgsql goes to some lengths to do reference-counting of
function cache entries, and I think you need the same here.

* I don't like much of anything about 0004.  It's messy and it
gives up all the benefit of plan caching in some pretty-common
cases (anywhere where the user was sloppy about what data type
is being returned).  I wonder if we couldn't solve that with
more finesse by applying check_sql_fn_retval() to the query tree
after parse analysis, but before we hand it to plancache.c?
This is different from what happens now because we'd be applying
it before not after query rewrite, but I don't think that
query rewrite ever changes the targetlist results.  Another
point is that the resultTargetList output might change subtly,
but I don't think we care there either: I believe we only examine
that output for its result data types and resjunk markers.
(This is nonobvious and inadequately documented, but for sure we
couldn't try to execute that tlist --- it's never passed through
the planner.)

* One diff that caught my eye was

-    if (!ActiveSnapshotSet() &&
-        plansource->raw_parse_tree &&
-        analyze_requires_snapshot(plansource->raw_parse_tree))
+    if (!ActiveSnapshotSet() && StmtPlanRequiresRevalidation(plansource))

Because StmtPlanRequiresRevalidation uses
stmt_requires_parse_analysis, this is basically throwing away the
distinction between stmt_requires_parse_analysis and
analyze_requires_snapshot.  I do not think that's okay, for the
reasons explained in analyze_requires_snapshot.  We should expend the
additional notational overhead to keep those things separate.

* I'm also not thrilled by teaching StmtPlanRequiresRevalidation
how to do something equivalent to stmt_requires_parse_analysis for
Query trees.  The entire point of the current division of labor is
for it *not* to know that, but keep the knowledge of the properties
of different statement types in parser/analyze.c.  So it looks to me
like we need to add a function to parser/analyze.c that does this.
Not quite sure what to call it though.
querytree_requires_parse_analysis() would be a weird name, because
if it's a Query tree then it's already been through parse analysis.
Maybe querytree_requires_revalidation()?

            regards, tom lane



Re: SQLFunctionCache and generic plans

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> čt 27. 2. 2025 v 20:52 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> So taken together, our results are all over the map, anywhere
>> from 7% speedup to 7% slowdown.  My usual rule of thumb is that

> Where do you see 7% speedup? Few lines up you wrote 0.7% faster.

Alexander got that on the fx4 case, according to his response a
few messages ago [1].  It'd be good if someone else could reproduce
that, because right now we have two "it's slower" results versus
only one "it's faster".

            regards, tom lane

[1] https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru



Re: SQLFunctionCache and generic plans

From
Pavel Stehule
Date:
Hi

čt 27. 2. 2025 v 21:45 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> čt 27. 2. 2025 v 20:52 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> So taken together, our results are all over the map, anywhere
>> from 7% speedup to 7% slowdown.  My usual rule of thumb is that

> Where do you see 7% speedup? Few lines up you wrote 0.7% faster.

Alexander got that on the fx4 case, according to his response a
few messages ago [1].  It'd be good if someone else could reproduce
that, because right now we have two "it's slower" results versus
only one "it's faster".

ok

here is a profile from master 

    6.98%  postgres  postgres           [.] hash_bytes
   6.30%  postgres  postgres           [.] palloc0
   3.57%  postgres  postgres           [.] SearchCatCacheInternal
   3.29%  postgres  postgres           [.] AllocSetAlloc
   2.65%  postgres  plpgsql.so         [.] exec_stmts
   2.55%  postgres  postgres           [.] expression_tree_walker_impl
   2.34%  postgres  postgres           [.] _SPI_execute_plan
   2.13%  postgres  postgres           [.] CheckExprStillValid
   2.02%  postgres  postgres           [.] fmgr_info_cxt_security
   1.89%  postgres  postgres           [.] ExecInitFunc
   1.51%  postgres  postgres           [.] ExecInterpExpr
   1.48%  postgres  postgres           [.] ResourceOwnerForget
   1.44%  postgres  postgres           [.] AllocSetReset
   1.35%  postgres  postgres           [.] MemoryContextCreate
   1.30%  postgres  plpgsql.so         [.] plpgsql_exec_function
   1.29%  postgres  libc.so.6          [.] __memcmp_sse2
   1.24%  postgres  postgres           [.] MemoryContextDelete
   1.13%  postgres  postgres           [.] check_stack_depth
   1.11%  postgres  postgres           [.] AllocSetContextCreateInternal
   1.09%  postgres  postgres           [.] resolve_polymorphic_argtypes
   1.08%  postgres  postgres           [.] hash_search_with_hash_value
   1.07%  postgres  postgres           [.] standard_ExecutorStart
   1.07%  postgres  postgres           [.] ExprEvalPushStep
   1.04%  postgres  postgres           [.] ExecInitExprRec
   0.95%  postgres  plpgsql.so         [.] plpgsql_estate_setup
   0.91%  postgres  postgres           [.] ExecReadyInterpretedExp

and from patched

   7.08%  postgres  postgres           [.] hash_bytes
   6.25%  postgres  postgres           [.] palloc0
   3.52%  postgres  postgres           [.] SearchCatCacheInternal
   3.30%  postgres  postgres           [.] AllocSetAlloc
   2.39%  postgres  postgres           [.] expression_tree_walker_impl
   2.37%  postgres  plpgsql.so         [.] exec_stmts
   2.15%  postgres  postgres           [.] _SPI_execute_plan
   2.10%  postgres  postgres           [.] CheckExprStillValid
   1.94%  postgres  postgres           [.] fmgr_info_cxt_security
   1.71%  postgres  postgres           [.] ExecInitFunc
   1.41%  postgres  postgres           [.] AllocSetReset
   1.40%  postgres  postgres           [.] ExecInterpExpr
   1.38%  postgres  postgres           [.] ExprEvalPushStep
   1.34%  postgres  postgres           [.] ResourceOwnerForget
   1.31%  postgres  postgres           [.] MemoryContextDelete
   1.24%  postgres  libc.so.6          [.] __memcmp_sse2
   1.21%  postgres  postgres           [.] MemoryContextCreate
   1.18%  postgres  postgres           [.] AllocSetContextCreateInternal
   1.17%  postgres  postgres           [.] hash_search_with_hash_value
   1.13%  postgres  postgres           [.] resolve_polymorphic_argtypes
   1.13%  postgres  plpgsql.so         [.] plpgsql_exec_function
   1.03%  postgres  postgres           [.] standard_ExecutorStart
   0.98%  postgres  postgres           [.] ExecInitExprRec
   0.96%  postgres  postgres           [.] check_stack_depth

looks so there is only one significant differences

ExprEvalPushStep 1.07 x 1.38%

Regards

Pavel

compiled without assertions on gcc 15 with 02

vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
stepping : 7
microcode : 0x2f
cpu MHz : 2691.102
cache size : 6144 KB





                        regards, tom lane

[1] https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru

Re: SQLFunctionCache and generic plans

From
Pavel Stehule
Date:
Hi

pá 28. 2. 2025 v 7:29 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

čt 27. 2. 2025 v 21:45 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> čt 27. 2. 2025 v 20:52 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> So taken together, our results are all over the map, anywhere
>> from 7% speedup to 7% slowdown.  My usual rule of thumb is that

> Where do you see 7% speedup? Few lines up you wrote 0.7% faster.

Alexander got that on the fx4 case, according to his response a
few messages ago [1].  It'd be good if someone else could reproduce
that, because right now we have two "it's slower" results versus
only one "it's faster".

ok

here is a profile from master 

    6.98%  postgres  postgres           [.] hash_bytes
   6.30%  postgres  postgres           [.] palloc0
   3.57%  postgres  postgres           [.] SearchCatCacheInternal
   3.29%  postgres  postgres           [.] AllocSetAlloc
   2.65%  postgres  plpgsql.so         [.] exec_stmts
   2.55%  postgres  postgres           [.] expression_tree_walker_impl
   2.34%  postgres  postgres           [.] _SPI_execute_plan
   2.13%  postgres  postgres           [.] CheckExprStillValid
   2.02%  postgres  postgres           [.] fmgr_info_cxt_security
   1.89%  postgres  postgres           [.] ExecInitFunc
   1.51%  postgres  postgres           [.] ExecInterpExpr
   1.48%  postgres  postgres           [.] ResourceOwnerForget
   1.44%  postgres  postgres           [.] AllocSetReset
   1.35%  postgres  postgres           [.] MemoryContextCreate
   1.30%  postgres  plpgsql.so         [.] plpgsql_exec_function
   1.29%  postgres  libc.so.6          [.] __memcmp_sse2
   1.24%  postgres  postgres           [.] MemoryContextDelete
   1.13%  postgres  postgres           [.] check_stack_depth
   1.11%  postgres  postgres           [.] AllocSetContextCreateInternal
   1.09%  postgres  postgres           [.] resolve_polymorphic_argtypes
   1.08%  postgres  postgres           [.] hash_search_with_hash_value
   1.07%  postgres  postgres           [.] standard_ExecutorStart
   1.07%  postgres  postgres           [.] ExprEvalPushStep
   1.04%  postgres  postgres           [.] ExecInitExprRec
   0.95%  postgres  plpgsql.so         [.] plpgsql_estate_setup
   0.91%  postgres  postgres           [.] ExecReadyInterpretedExp

and from patched

   7.08%  postgres  postgres           [.] hash_bytes
   6.25%  postgres  postgres           [.] palloc0
   3.52%  postgres  postgres           [.] SearchCatCacheInternal
   3.30%  postgres  postgres           [.] AllocSetAlloc
   2.39%  postgres  postgres           [.] expression_tree_walker_impl
   2.37%  postgres  plpgsql.so         [.] exec_stmts
   2.15%  postgres  postgres           [.] _SPI_execute_plan
   2.10%  postgres  postgres           [.] CheckExprStillValid
   1.94%  postgres  postgres           [.] fmgr_info_cxt_security
   1.71%  postgres  postgres           [.] ExecInitFunc
   1.41%  postgres  postgres           [.] AllocSetReset
   1.40%  postgres  postgres           [.] ExecInterpExpr
   1.38%  postgres  postgres           [.] ExprEvalPushStep
   1.34%  postgres  postgres           [.] ResourceOwnerForget
   1.31%  postgres  postgres           [.] MemoryContextDelete
   1.24%  postgres  libc.so.6          [.] __memcmp_sse2
   1.21%  postgres  postgres           [.] MemoryContextCreate
   1.18%  postgres  postgres           [.] AllocSetContextCreateInternal
   1.17%  postgres  postgres           [.] hash_search_with_hash_value
   1.13%  postgres  postgres           [.] resolve_polymorphic_argtypes
   1.13%  postgres  plpgsql.so         [.] plpgsql_exec_function
   1.03%  postgres  postgres           [.] standard_ExecutorStart
   0.98%  postgres  postgres           [.] ExecInitExprRec
   0.96%  postgres  postgres           [.] check_stack_depth

looks so there is only one significant differences

ExprEvalPushStep 1.07 x 1.38%

Regards

Pavel

compiled without assertions on gcc 15 with 02

vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
stepping : 7
microcode : 0x2f
cpu MHz : 2691.102
cache size : 6144 KB


I tested the patches on another notebook with more recent cpu

vendor_id : GenuineIntel
cpu family : 6
model : 154
model name : 12th Gen Intel(R) Core(TM) i7-12700H
stepping : 3
microcode : 0x436
cpu MHz : 400.000
cache size : 24576 KB

And the difference are smaller - about 3%

Regards

Pavel





                        regards, tom lane

[1] https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru