Thread: SQLFunctionCache and generic plans
Hello, It has been brought to my attention that SQL functions always use generic plans. Take this function for example: create or replace function test_plpgsql(p1 oid) returns text as $$ BEGIN RETURN (SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 1); END; $$ language plpgsql; As expected, the PlanCache takes care of generating parameter specific plans, and correctly prunes the redundant OR depending on wether we call the function with a NULL value or not: ro=# select test_plpgsql(NULL); LOG: duration: 0.030 ms plan: Query Text: (SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 1) Result (cost=0.04..0.05 rows=1 width=64) InitPlan 1 (returns $0) -> Limit (cost=0.00..0.04 rows=1 width=64) -> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=64) LOG: duration: 0.662 ms plan: Query Text: select test_plpgsql(NULL); Result (cost=0.00..0.26 rows=1 width=32) ro=# select test_plpgsql(1); LOG: duration: 0.075 ms plan: Query Text: (SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 1) Result (cost=8.29..8.30 rows=1 width=64) InitPlan 1 (returns $0) -> Limit (cost=0.27..8.29 rows=1 width=64) -> Index Scan using pg_class_oid_index on pg_class (cost=0.27..8.29 rows=1 width=64) Index Cond: (oid = '1'::oid) LOG: duration: 0.675 ms plan: Query Text: select test_plpgsql(1); Result (cost=0.00..0.26 rows=1 width=32) But writing the same function in SQL: create or replace function test_sql(p1 oid) returns text as $$ SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 1 $$ language sql; we end up with a generic plan: ro=# select test_sql(1); LOG: duration: 0.287 ms plan: Query Text: SELECT relname FROM pg_class WHERE oid = p1 OR p1 IS NULL LIMIT 1 Query Parameters: $1 = '1' Limit (cost=0.00..6.39 rows=1 width=32) -> Seq Scan on pg_class (cost=0.00..19.16 rows=3 width=32) Filter: ((oid = $1) OR ($1 IS NULL)) This is due to the fact that SQL functions are planned once for the whole query using a specific SQLFunctionCache instead of using the whole PlanCache machinery. The following comment can be found in functions.c, about the SQLFunctionCache: * Note that currently this has only the lifespan of the calling query. * Someday we should rewrite this code to use plancache.c to save parse/plan * results for longer than that. I would be interested in working on this, primarily to avoid this problem of having generic query plans for SQL functions but maybe having a longer lived cache as well would be nice to have. Is there any reason not too, or pitfalls we would like to avoid ? Best regards, -- Ronan Dunklau
Ronan Dunklau <ronan.dunklau@aiven.io> writes: > The following comment can be found in functions.c, about the SQLFunctionCache: > * Note that currently this has only the lifespan of the calling query. > * Someday we should rewrite this code to use plancache.c to save parse/plan > * results for longer than that. > I would be interested in working on this, primarily to avoid this problem of > having generic query plans for SQL functions but maybe having a longer lived > cache as well would be nice to have. > Is there any reason not too, or pitfalls we would like to avoid ? AFAIR it's just lack of round tuits. There would probably be some semantic side-effects, though if you pay attention you could likely make things better while you are at it. The existing behavior of parsing and planning all the statements at once is not very desirable --- for instance, it doesn't work to do CREATE TABLE foo AS ...; SELECT * FROM foo; I think if we're going to nuke this code and start over, we should try to make that sort of case work. regards, tom lane
Hi, Alexander! On Tue, Sep 3, 2024 at 10:33 AM Alexander Pyhalov <a.pyhalov@postgrespro.ru> wrote: > Tom Lane писал(а) 2023-02-07 18:29: > > Ronan Dunklau <ronan.dunklau@aiven.io> writes: > >> The following comment can be found in functions.c, about the > >> SQLFunctionCache: > > > >> * Note that currently this has only the lifespan of the calling > >> query. > >> * Someday we should rewrite this code to use plancache.c to save > >> parse/plan > >> * results for longer than that. > > > >> I would be interested in working on this, primarily to avoid this > >> problem of > >> having generic query plans for SQL functions but maybe having a longer > >> lived > >> cache as well would be nice to have. > >> Is there any reason not too, or pitfalls we would like to avoid ? > > > > AFAIR it's just lack of round tuits. There would probably be some > > semantic side-effects, though if you pay attention you could likely > > make things better while you are at it. The existing behavior of > > parsing and planning all the statements at once is not very desirable > > --- for instance, it doesn't work to do > > CREATE TABLE foo AS ...; > > SELECT * FROM foo; > > I think if we're going to nuke this code and start over, we should > > try to make that sort of case work. > > > > regards, tom lane > > Hi. > > I've tried to make SQL functions use CachedPlan machinery. The main goal > was to allow SQL functions to use custom plans > (the work was started from question - why sql function is so slow > compared to plpgsql one). It turned out that > plpgsql function used custom plan and eliminated scan of all irrelevant > sections, but > exec-time pruning didn't cope with pruning when ScalarArrayOpExpr, > filtering data using int[] parameter. > > In current prototype there are two restrictions. The first one is that > CachecPlan has lifetime of a query - it's not > saved for future use, as we don't have something like plpgsql hashtable > for long live function storage. Second - > SQL language functions in sql_body form (with stored queryTree_list) are > handled in the old way, as we currently lack > tools to make cached plans from query trees. > > Currently this change solves the issue of inefficient plans for queries > over partitioned tables. For example, function like > > CREATE OR REPLACE FUNCTION public.test_get_records(ids integer[]) > RETURNS SETOF test > LANGUAGE sql > AS $function$ > select * > from test > where id = any (ids) > $function$; > > for hash-distributed table test can perform pruning in plan time and > can have plan like > > Append (cost=0.00..51.88 rows=26 width=36) > -> Seq Scan on test_0 test_1 (cost=0.00..25.88 rows=13 > width=36) > Filter: (id = ANY ('{1,2}'::integer[])) > -> Seq Scan on test_2 (cost=0.00..25.88 rows=13 width=36) > Filter: (id = ANY ('{1,2}'::integer[])) > > instead of > > Append (cost=0.00..155.54 rows=248 width=36) > -> Seq Scan on test_0 test_1 (cost=0.00..38.58 rows=62 > width=36) > Filter: (id = ANY ($1)) > -> Seq Scan on test_1 test_2 (cost=0.00..38.58 rows=62 > width=36) > Filter: (id = ANY ($1)) > -> Seq Scan on test_2 test_3 (cost=0.00..38.58 rows=62 > width=36) > Filter: (id = ANY ($1)) > -> Seq Scan on test_3 test_4 (cost=0.00..38.58 rows=62 > width=36) > Filter: (id = ANY ($1)) > > This patch definitely requires more work, and I share it to get some > early feedback. > > What should we do with "pre-parsed" SQL functions (when prosrc is > empty)? How should we create cached plans when we don't have raw > parsetrees? > Currently we can create cached plans without raw parsetrees, but this > means that plan revalidation doesn't work, choose_custom_plan() > always returns false and we get generic plan. Perhaps, we need some form > of GetCachedPlan(), which ignores raw_parse_tree? I don't think you need a new form of GetCachedPlan(). Instead, it seems that StmtPlanRequiresRevalidation() should be revised. As I got from comments and the d8b2fcc9d4 commit message, the primary goal was to skip revalidation of utility statements. Skipping revalidation was a positive side effect, as long as we didn't support custom plans for them anyway. But as you're going to change this, StmtPlanRequiresRevalidation() needs to be revised. I also think it's not necessary to implement long-lived plan cache in the initial patch. The work could be split into two patches. The first could implement query lifetime plan cache. This is beneficial already by itself as you've shown by example. The second could implement long-lived plan cache. I appreciate your work in this direction. I hope you got the feedback to go ahead and work on remaining issues. ------ Regards, Alexander Korotkov Supabase
Hi
út 31. 12. 2024 v 16:36 odesílatel Alexander Pyhalov <a.pyhalov@postgrespro.ru> napsal:
Hi.
>> What should we do with "pre-parsed" SQL functions (when prosrc is
>> empty)? How should we create cached plans when we don't have raw
>> parsetrees?
>> Currently we can create cached plans without raw parsetrees, but this
>> means that plan revalidation doesn't work, choose_custom_plan()
>> always returns false and we get generic plan. Perhaps, we need some
>> form
>> of GetCachedPlan(), which ignores raw_parse_tree?
>
> I don't think you need a new form of GetCachedPlan(). Instead, it
> seems that StmtPlanRequiresRevalidation() should be revised. As I got
> from comments and the d8b2fcc9d4 commit message, the primary goal was
> to skip revalidation of utility statements. Skipping revalidation was
> a positive side effect, as long as we didn't support custom plans for
> them anyway. But as you're going to change this,
> StmtPlanRequiresRevalidation() needs to be revised.
>
Thanks for feedback.
I've modifed StmtPlanRequiresRevalidation() so that it looks on queries
command type.
Not sure if it's enough or I have to recreate something more similar to
stmt_requires_parse_analysis()
logic. I was wondering if this can lead to triggering plan revalidation
in RevalidateCachedQuery().
I suppose not - as we plan in executor (so shouldn't catch userid change
or see some changes in
related objects. Revalidation would kill our plan, destroying
resultDesc.
Also while looking at this, fixed processing of instead of rules (which
would lead to NULL execution_state).
--
there are lot of fails found by tester
Please, can you check it?
regards
Pavel
Best regards,
Alexander Pyhalov,
Postgres Professional
Hi
čt 30. 1. 2025 v 9:50 odesílatel Alexander Pyhalov <a.pyhalov@postgrespro.ru> napsal:
Alexander Pyhalov писал(а) 2025-01-29 17:35:
> Tom Lane писал(а) 2025-01-17 21:27:
>> Alexander Pyhalov <a.pyhalov@postgrespro.ru> writes:
>>> I've rebased patch on master. Tests pass here.
>>
>> The cfbot still doesn't like it; my guess is that you built without
>> --with-libxml and so didn't notice the effects on xml.out.
>
> Hi. Thank you for review.
>
> I've updated patch.
Sorry, missed one local patch to fix memory bloat during replaning. Also
fixed a compiler warning.
Did you do some performance checks?
I tried some worst case
CREATE OR REPLACE FUNCTION fx(int)
RETURNS int AS $$
SELECT $1 + $1
$$ LANGUAGE SQL IMMUTABLE;
CREATE OR REPLACE FUNCTION fx2(int)
RETURNS int AS $$
SELECT $1 * 2
$$ LANGUAGE SQL IMMUTABLE;
do $$
begin
for i in 1..1000000 loop
perform fx(i); -- or fx2
end loop;
end;
$$;
DO
begin
for i in 1..1000000 loop
perform fx(i); -- or fx2
end loop;
end;
$$;
DO
The patched version reduces the difference between execution fx and fx2, but patched version is about 10% slower than unpatched.
The overhead of plan cache looks significant for simple cases (and a lot of SQL functions are very simple).
Regards
Pavel
--
Best regards,
Alexander Pyhalov,
Postgres Professional
Pavel Stehule <pavel.stehule@gmail.com> writes: > Did you do some performance checks? This is a good question to ask ... > I tried some worst case > CREATE OR REPLACE FUNCTION fx(int) > RETURNS int AS $$ > SELECT $1 + $1 > $$ LANGUAGE SQL IMMUTABLE; ... but I don't think tests like this will give helpful answers. That function is simple enough to be inlined: regression=# explain verbose select fx(f1) from int4_tbl; QUERY PLAN --------------------------------------------------------------- Seq Scan on public.int4_tbl (cost=0.00..1.06 rows=5 width=4) Output: (f1 + f1) (2 rows) So functions.c shouldn't have any involvement at all in the actually-executed PERFORM expression, and whatever difference you measured must have been noise. (If the effect *is* real, we'd better find out why.) You need to test with a non-inline-able function. Looking at the inlining conditions in inline_function(), one simple hack is to make the function return SETOF. That'll only exercise the returns-set path in functions.c though, so it'd be advisable to check other inline-blocking conditions too. regards, tom lane
Hi
I did multiple benchmarking, and still looks so the proposed patch doesn't help and has significant overhead
testcase:
create or replace function fx(int) returns int as $$ select $1 + $1; $$ language sql immutable;
create or replace function fx2(int) returns int as $$ select 2 * $1; $$ language sql immutable;
I tested
do $$
begin
for i in 1..1000000 loop
perform fx((random()*100)::int); -- or fx2
end loop;
end;
$$;
begin
for i in 1..1000000 loop
perform fx((random()*100)::int); -- or fx2
end loop;
end;
$$;
Results (master, patched):
fx: 17067 ms, 22165 ms
fx2: 2234 ms, 2311 ms
the execution of dynamic sql
2025-02-03 18:47:33) postgres=# do $$
begin
for i in 1..1000000 loop
execute 'select $1 + $1' using (random()*100)::int;
end loop;
end;
$$;
DO
Time: 13412.990 ms (00:13.413)
begin
for i in 1..1000000 loop
execute 'select $1 + $1' using (random()*100)::int;
end loop;
end;
$$;
DO
Time: 13412.990 ms (00:13.413)
In the profiler I see a significant overhead of the parser, so it looks like there is some more (overhead), but plan cache is not used.
Please, can somebody recheck my tests?
Regards
Pavel
Pavel Stehule <pavel.stehule@gmail.com> writes: > I did multiple benchmarking, and still looks so the proposed patch doesn't > help and has significant overhead Yeah, your fx() test case is clearly worse. For me, HEAD: regression=# do $$ begin for i in 1..1000000 loop perform fx((random()*100)::int); -- or fx2 end loop; end; $$; DO Time: 5229.184 ms (00:05.229) PATCH: regression=# do $$ begin for i in 1..1000000 loop perform fx((random()*100)::int); -- or fx2 end loop; end; $$; DO Time: 6934.413 ms (00:06.934) Adding some debug printout shows me that BuildCachedPlan is called to construct a custom plan on every single execution, which is presumably because the patch doesn't make any attempt to carry plancache state across successive executions of the same query. If we were saving that state it would have soon switched to a generic plan and then won big. So, even though I thought we could leave that for later, it seems like maybe we have to have it before we'll have a committable patch. There might be some residual inefficiency in there though. In the unpatched code we'd be calling pg_parse_query and pg_plan_query once per execution. You'd think that would cost more than BuildCachedPlan, which can skip the raw-parsing part. Even more interesting, the patch gets slower yet if we use a new-style SQL function: regression=# create or replace function fx3 (int) returns int immutable regression-# begin atomic select $1 + $1; end; CREATE FUNCTION Time: 0.813 ms regression=# do $$ begin for i in 1..1000000 loop perform fx3((random()*100)::int); -- or fx2 end loop; end; $$; DO Time: 8007.062 ms (00:08.007) That makes no sense either, because with a new-style SQL function we should be skipping parse analysis as well. But wait: HEAD takes Time: 6632.709 ms (00:06.633) to do the same thing. So somehow the new-style SQL function stuff is very materially slower in this use-case, with or without this patch. I do not understand why. Definitely some performance investigation needs to be done here. Even without cross-query plan caching, I don't see why the patch isn't better than it is. It ought to be at least competitive with the unpatched code. (I've not read the v5 patch yet, so I have no theories.) regards, tom lane
I wrote: > But wait: HEAD takes > Time: 6632.709 ms (00:06.633) > to do the same thing. So somehow the new-style SQL function > stuff is very materially slower in this use-case, with or > without this patch. I do not understand why. "perf" tells me that in the fx3 test, a full third of the runtime is spent inside stringToNode(), with about three-quarters of that going into pg_strtok(). This makes no sense to me: we'll be reading the prosqlbody of fx3(), sure, but that's not enormously long (about 1200 bytes). And pg_strtok() doesn't look remarkably slow. There's no way this should be taking more time than raw parsing + parse analysis, even for such a trivial query as "select $1 + $1". There's been some talk of getting rid of our existing nodetree storage format in favor of something more efficient. Maybe we should put a higher priority on getting that done. But anyway, that seems orthogonal to the current patch. > Even without cross-query plan caching, I don't see why the > patch isn't better than it is. It ought to be at least > competitive with the unpatched code. This remains true. regards, tom lane
hI
I can confirm 60% speedup for execution of function fx and fx3 - both functions are very primitive, so for real code the benefit can be higher
Unfortunately, there is about 5% slowdown for inlined code, and for just plpgsql code too.
I tested fx4
create or replace function fx4(int) returns int immutable as $$ begin return $1 + $1; end $$ language plpgsql;
and fx2
create or replace function fx2(int) returns int as $$ select 2 * $1; $$
language sql immutable;
language sql immutable;
and execution of patched code is about 5% slower. It is strange so this patch has a negative impact on plpgsql execution.
Regards
Pavel
Pavel Stehule писал(а) 2025-02-26 22:34: > hI > > I can confirm 60% speedup for execution of function fx and fx3 - both > functions are very primitive, so for real code the benefit can be > higher > > Unfortunately, there is about 5% slowdown for inlined code, and for > just plpgsql code too. > > I tested fx4 > > create or replace function fx4(int) returns int immutable as $$ begin > return $1 + $1; end $$ language plpgsql; > > and fx2 > > create or replace function fx2(int) returns int as $$ select 2 * $1; > $$ > language sql immutable; > > and execution of patched code is about 5% slower. It is strange so > this patch has a negative impact on plpgsql execution. > > Regards > > Pavel Hi. I've tried to reproduce slowdown and couldn't. create or replace function fx4(int) returns int immutable as $$ begin return $1 + $1; end $$ language plpgsql; do $$ begin for i in 1..5000000 loop perform fx4((random()*100)::int); -- or fx2 end loop; end; $$; OLD results: Time: 8268.614 ms (00:08.269) Time: 8178.836 ms (00:08.179) Time: 8306.694 ms (00:08.307) New (patched) results: Time: 7743.945 ms (00:07.744) Time: 7803.109 ms (00:07.803) Time: 7736.735 ms (00:07.737) Not sure why new is faster (perhaps, some noise?) - looking at perf flamegraphs I don't see something evident. create or replace function fx2(int) returns int as $$ select 2 * $1; $$ language sql immutable; do $$ begin for i in 1..5000000 loop perform fx2((random()*100)::int); -- or fx2 end loop; end; $$; OLD results: Time: 5346.471 ms (00:05.346) Time: 5359.222 ms (00:05.359) Time: 5316.747 ms (00:05.317) New (patched) results: Time: 5188.363 ms (00:05.188) Time: 5225.322 ms (00:05.225) Time: 5203.667 ms (00:05.204) -- Best regards, Alexander Pyhalov, Postgres Professional
čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov <a.pyhalov@postgrespro.ru> napsal:
Pavel Stehule писал(а) 2025-02-26 22:34:
> hI
>
> I can confirm 60% speedup for execution of function fx and fx3 - both
> functions are very primitive, so for real code the benefit can be
> higher
>
> Unfortunately, there is about 5% slowdown for inlined code, and for
> just plpgsql code too.
>
> I tested fx4
>
> create or replace function fx4(int) returns int immutable as $$ begin
> return $1 + $1; end $$ language plpgsql;
>
> and fx2
>
> create or replace function fx2(int) returns int as $$ select 2 * $1;
> $$
> language sql immutable;
>
> and execution of patched code is about 5% slower. It is strange so
> this patch has a negative impact on plpgsql execution.
>
> Regards
>
> Pavel
Hi. I've tried to reproduce slowdown and couldn't.
create or replace function fx4(int) returns int immutable as $$ begin
return $1 + $1; end $$ language plpgsql;
do $$
begin
for i in 1..5000000 loop
perform fx4((random()*100)::int); -- or fx2
end loop;
end;
$$;
OLD results:
Time: 8268.614 ms (00:08.269)
Time: 8178.836 ms (00:08.179)
Time: 8306.694 ms (00:08.307)
New (patched) results:
Time: 7743.945 ms (00:07.744)
Time: 7803.109 ms (00:07.803)
Time: 7736.735 ms (00:07.737)
Not sure why new is faster (perhaps, some noise?) - looking at perf
flamegraphs I don't see something evident.
create or replace function fx2(int) returns int as $$ select 2 * $1; $$
language sql immutable;
do $$
begin
for i in 1..5000000 loop
perform fx2((random()*100)::int); -- or fx2
end loop;
end;
$$;
OLD results:
Time: 5346.471 ms (00:05.346)
Time: 5359.222 ms (00:05.359)
Time: 5316.747 ms (00:05.317)
New (patched) results:
Time: 5188.363 ms (00:05.188)
Time: 5225.322 ms (00:05.225)
Time: 5203.667 ms (00:05.204)
I'll try to get profiles.
Regards
Pavel
--
Best regards,
Alexander Pyhalov,
Postgres Professional
Pavel Stehule <pavel.stehule@gmail.com> writes: > čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov < > a.pyhalov@postgrespro.ru> napsal: >>> Unfortunately, there is about 5% slowdown for inlined code, and for >>> just plpgsql code too. >> Hi. I've tried to reproduce slowdown and couldn't. > I'll try to get profiles. I tried to reproduce this too. What I got on my usual development workstation (RHEL8/gcc 8.5.0 on x86_64) was: fx2 example: v6 patch about 2.4% slower than HEAD fx4 example: v6 patch about 7.3% slower than HEAD I was quite concerned after that result, but then I tried it on another machine (macOS/clang 16.0.0 on Apple M1) and got: fx2 example: v6 patch about 0.2% slower than HEAD fx4 example: v6 patch about 0.7% faster than HEAD (These are average-of-three-runs tests on --disable-cassert builds; I trust you guys were not doing performance tests on assert-enabled builds?) So taken together, our results are all over the map, anywhere from 7% speedup to 7% slowdown. My usual rule of thumb is that you can see up to 2% variation in this kind of microbenchmark even when "nothing has changed", just due to random build details like whether critical loops cross a cacheline or not. 7% is pretty well above that threshold, but maybe it's just random build variation anyway. Furthermore, since neither example involves functions.c at all (fx2 would be inlined, and fx4 isn't SQL-language), it's hard to see how the patch would directly affect either example unless it were adding overhead to plancache.c. And I don't see any changes there that would amount to meaningful overhead for the existing use-case with a raw parse tree. So right at the moment I'm inclined to write this off as measurement noise. Perhaps it'd be worth checking a few more platforms, though. regards, tom lane
čt 27. 2. 2025 v 20:52 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> čt 27. 2. 2025 v 13:25 odesílatel Alexander Pyhalov <
> a.pyhalov@postgrespro.ru> napsal:
>>> Unfortunately, there is about 5% slowdown for inlined code, and for
>>> just plpgsql code too.
>> Hi. I've tried to reproduce slowdown and couldn't.
> I'll try to get profiles.
I tried to reproduce this too. What I got on my usual development
workstation (RHEL8/gcc 8.5.0 on x86_64) was:
fx2 example: v6 patch about 2.4% slower than HEAD
fx4 example: v6 patch about 7.3% slower than HEAD
I was quite concerned after that result, but then I tried it on
another machine (macOS/clang 16.0.0 on Apple M1) and got:
fx2 example: v6 patch about 0.2% slower than HEAD
fx4 example: v6 patch about 0.7% faster than HEAD
(These are average-of-three-runs tests on --disable-cassert
builds; I trust you guys were not doing performance tests on
assert-enabled builds?)
So taken together, our results are all over the map, anywhere
from 7% speedup to 7% slowdown. My usual rule of thumb is that
Where do you see 7% speedup? Few lines up you wrote 0.7% faster.
you can see up to 2% variation in this kind of microbenchmark even
when "nothing has changed", just due to random build details like
whether critical loops cross a cacheline or not. 7% is pretty
well above that threshold, but maybe it's just random build
variation anyway.
Furthermore, since neither example involves functions.c at all
(fx2 would be inlined, and fx4 isn't SQL-language), it's hard
to see how the patch would directly affect either example unless
it were adding overhead to plancache.c. And I don't see any
changes there that would amount to meaningful overhead for the
existing use-case with a raw parse tree.
So right at the moment I'm inclined to write this off as
measurement noise. Perhaps it'd be worth checking a few
more platforms, though.
regards, tom lane
Alexander Pyhalov <a.pyhalov@postgrespro.ru> writes: > Now sql functions plans are actually saved. The most of it is a > simplified version of plpgsql plan cache. Perhaps, I've missed > something. A couple of thoughts about v6: * I don't think it's okay to just summarily do this: /* It's stale; unlink and delete */ fcinfo->flinfo->fn_extra = NULL; MemoryContextDelete(fcache->fcontext); fcache = NULL; when fmgr_sql sees that the cache is stale. If we're doing a nested call of a recursive SQL function, this'd be cutting the legs out from under the outer recursion level. plpgsql goes to some lengths to do reference-counting of function cache entries, and I think you need the same here. * I don't like much of anything about 0004. It's messy and it gives up all the benefit of plan caching in some pretty-common cases (anywhere where the user was sloppy about what data type is being returned). I wonder if we couldn't solve that with more finesse by applying check_sql_fn_retval() to the query tree after parse analysis, but before we hand it to plancache.c? This is different from what happens now because we'd be applying it before not after query rewrite, but I don't think that query rewrite ever changes the targetlist results. Another point is that the resultTargetList output might change subtly, but I don't think we care there either: I believe we only examine that output for its result data types and resjunk markers. (This is nonobvious and inadequately documented, but for sure we couldn't try to execute that tlist --- it's never passed through the planner.) * One diff that caught my eye was - if (!ActiveSnapshotSet() && - plansource->raw_parse_tree && - analyze_requires_snapshot(plansource->raw_parse_tree)) + if (!ActiveSnapshotSet() && StmtPlanRequiresRevalidation(plansource)) Because StmtPlanRequiresRevalidation uses stmt_requires_parse_analysis, this is basically throwing away the distinction between stmt_requires_parse_analysis and analyze_requires_snapshot. I do not think that's okay, for the reasons explained in analyze_requires_snapshot. We should expend the additional notational overhead to keep those things separate. * I'm also not thrilled by teaching StmtPlanRequiresRevalidation how to do something equivalent to stmt_requires_parse_analysis for Query trees. The entire point of the current division of labor is for it *not* to know that, but keep the knowledge of the properties of different statement types in parser/analyze.c. So it looks to me like we need to add a function to parser/analyze.c that does this. Not quite sure what to call it though. querytree_requires_parse_analysis() would be a weird name, because if it's a Query tree then it's already been through parse analysis. Maybe querytree_requires_revalidation()? regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > čt 27. 2. 2025 v 20:52 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal: >> So taken together, our results are all over the map, anywhere >> from 7% speedup to 7% slowdown. My usual rule of thumb is that > Where do you see 7% speedup? Few lines up you wrote 0.7% faster. Alexander got that on the fx4 case, according to his response a few messages ago [1]. It'd be good if someone else could reproduce that, because right now we have two "it's slower" results versus only one "it's faster". regards, tom lane [1] https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru
Hi
čt 27. 2. 2025 v 21:45 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> čt 27. 2. 2025 v 20:52 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> So taken together, our results are all over the map, anywhere
>> from 7% speedup to 7% slowdown. My usual rule of thumb is that
> Where do you see 7% speedup? Few lines up you wrote 0.7% faster.
Alexander got that on the fx4 case, according to his response a
few messages ago [1]. It'd be good if someone else could reproduce
that, because right now we have two "it's slower" results versus
only one "it's faster".
ok
here is a profile from master
6.98% postgres postgres [.] hash_bytes
6.30% postgres postgres [.] palloc0
3.57% postgres postgres [.] SearchCatCacheInternal
3.29% postgres postgres [.] AllocSetAlloc
2.65% postgres plpgsql.so [.] exec_stmts
2.55% postgres postgres [.] expression_tree_walker_impl
2.34% postgres postgres [.] _SPI_execute_plan
2.13% postgres postgres [.] CheckExprStillValid
2.02% postgres postgres [.] fmgr_info_cxt_security
1.89% postgres postgres [.] ExecInitFunc
1.51% postgres postgres [.] ExecInterpExpr
1.48% postgres postgres [.] ResourceOwnerForget
1.44% postgres postgres [.] AllocSetReset
1.35% postgres postgres [.] MemoryContextCreate
1.30% postgres plpgsql.so [.] plpgsql_exec_function
1.29% postgres libc.so.6 [.] __memcmp_sse2
1.24% postgres postgres [.] MemoryContextDelete
1.13% postgres postgres [.] check_stack_depth
1.11% postgres postgres [.] AllocSetContextCreateInternal
1.09% postgres postgres [.] resolve_polymorphic_argtypes
1.08% postgres postgres [.] hash_search_with_hash_value
1.07% postgres postgres [.] standard_ExecutorStart
1.07% postgres postgres [.] ExprEvalPushStep
1.04% postgres postgres [.] ExecInitExprRec
0.95% postgres plpgsql.so [.] plpgsql_estate_setup
0.91% postgres postgres [.] ExecReadyInterpretedExp
6.30% postgres postgres [.] palloc0
3.57% postgres postgres [.] SearchCatCacheInternal
3.29% postgres postgres [.] AllocSetAlloc
2.65% postgres plpgsql.so [.] exec_stmts
2.55% postgres postgres [.] expression_tree_walker_impl
2.34% postgres postgres [.] _SPI_execute_plan
2.13% postgres postgres [.] CheckExprStillValid
2.02% postgres postgres [.] fmgr_info_cxt_security
1.89% postgres postgres [.] ExecInitFunc
1.51% postgres postgres [.] ExecInterpExpr
1.48% postgres postgres [.] ResourceOwnerForget
1.44% postgres postgres [.] AllocSetReset
1.35% postgres postgres [.] MemoryContextCreate
1.30% postgres plpgsql.so [.] plpgsql_exec_function
1.29% postgres libc.so.6 [.] __memcmp_sse2
1.24% postgres postgres [.] MemoryContextDelete
1.13% postgres postgres [.] check_stack_depth
1.11% postgres postgres [.] AllocSetContextCreateInternal
1.09% postgres postgres [.] resolve_polymorphic_argtypes
1.08% postgres postgres [.] hash_search_with_hash_value
1.07% postgres postgres [.] standard_ExecutorStart
1.07% postgres postgres [.] ExprEvalPushStep
1.04% postgres postgres [.] ExecInitExprRec
0.95% postgres plpgsql.so [.] plpgsql_estate_setup
0.91% postgres postgres [.] ExecReadyInterpretedExp
and from patched
7.08% postgres postgres [.] hash_bytes
6.25% postgres postgres [.] palloc0
3.52% postgres postgres [.] SearchCatCacheInternal
3.30% postgres postgres [.] AllocSetAlloc
2.39% postgres postgres [.] expression_tree_walker_impl
2.37% postgres plpgsql.so [.] exec_stmts
2.15% postgres postgres [.] _SPI_execute_plan
2.10% postgres postgres [.] CheckExprStillValid
1.94% postgres postgres [.] fmgr_info_cxt_security
1.71% postgres postgres [.] ExecInitFunc
1.41% postgres postgres [.] AllocSetReset
1.40% postgres postgres [.] ExecInterpExpr
1.38% postgres postgres [.] ExprEvalPushStep
1.34% postgres postgres [.] ResourceOwnerForget
1.31% postgres postgres [.] MemoryContextDelete
1.24% postgres libc.so.6 [.] __memcmp_sse2
1.21% postgres postgres [.] MemoryContextCreate
1.18% postgres postgres [.] AllocSetContextCreateInternal
1.17% postgres postgres [.] hash_search_with_hash_value
1.13% postgres postgres [.] resolve_polymorphic_argtypes
1.13% postgres plpgsql.so [.] plpgsql_exec_function
1.03% postgres postgres [.] standard_ExecutorStart
0.98% postgres postgres [.] ExecInitExprRec
0.96% postgres postgres [.] check_stack_depth
6.25% postgres postgres [.] palloc0
3.52% postgres postgres [.] SearchCatCacheInternal
3.30% postgres postgres [.] AllocSetAlloc
2.39% postgres postgres [.] expression_tree_walker_impl
2.37% postgres plpgsql.so [.] exec_stmts
2.15% postgres postgres [.] _SPI_execute_plan
2.10% postgres postgres [.] CheckExprStillValid
1.94% postgres postgres [.] fmgr_info_cxt_security
1.71% postgres postgres [.] ExecInitFunc
1.41% postgres postgres [.] AllocSetReset
1.40% postgres postgres [.] ExecInterpExpr
1.38% postgres postgres [.] ExprEvalPushStep
1.34% postgres postgres [.] ResourceOwnerForget
1.31% postgres postgres [.] MemoryContextDelete
1.24% postgres libc.so.6 [.] __memcmp_sse2
1.21% postgres postgres [.] MemoryContextCreate
1.18% postgres postgres [.] AllocSetContextCreateInternal
1.17% postgres postgres [.] hash_search_with_hash_value
1.13% postgres postgres [.] resolve_polymorphic_argtypes
1.13% postgres plpgsql.so [.] plpgsql_exec_function
1.03% postgres postgres [.] standard_ExecutorStart
0.98% postgres postgres [.] ExecInitExprRec
0.96% postgres postgres [.] check_stack_depth
looks so there is only one significant differences
ExprEvalPushStep 1.07 x 1.38%
Regards
Pavel
compiled without assertions on gcc 15 with 02
vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
stepping : 7
microcode : 0x2f
cpu MHz : 2691.102
cache size : 6144 KB
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
stepping : 7
microcode : 0x2f
cpu MHz : 2691.102
cache size : 6144 KB
regards, tom lane
[1] https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru
Hi
pá 28. 2. 2025 v 7:29 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hičt 27. 2. 2025 v 21:45 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:Pavel Stehule <pavel.stehule@gmail.com> writes:
> čt 27. 2. 2025 v 20:52 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> So taken together, our results are all over the map, anywhere
>> from 7% speedup to 7% slowdown. My usual rule of thumb is that
> Where do you see 7% speedup? Few lines up you wrote 0.7% faster.
Alexander got that on the fx4 case, according to his response a
few messages ago [1]. It'd be good if someone else could reproduce
that, because right now we have two "it's slower" results versus
only one "it's faster".okhere is a profile from master6.98% postgres postgres [.] hash_bytes
6.30% postgres postgres [.] palloc0
3.57% postgres postgres [.] SearchCatCacheInternal
3.29% postgres postgres [.] AllocSetAlloc
2.65% postgres plpgsql.so [.] exec_stmts
2.55% postgres postgres [.] expression_tree_walker_impl
2.34% postgres postgres [.] _SPI_execute_plan
2.13% postgres postgres [.] CheckExprStillValid
2.02% postgres postgres [.] fmgr_info_cxt_security
1.89% postgres postgres [.] ExecInitFunc
1.51% postgres postgres [.] ExecInterpExpr
1.48% postgres postgres [.] ResourceOwnerForget
1.44% postgres postgres [.] AllocSetReset
1.35% postgres postgres [.] MemoryContextCreate
1.30% postgres plpgsql.so [.] plpgsql_exec_function
1.29% postgres libc.so.6 [.] __memcmp_sse2
1.24% postgres postgres [.] MemoryContextDelete
1.13% postgres postgres [.] check_stack_depth
1.11% postgres postgres [.] AllocSetContextCreateInternal
1.09% postgres postgres [.] resolve_polymorphic_argtypes
1.08% postgres postgres [.] hash_search_with_hash_value
1.07% postgres postgres [.] standard_ExecutorStart
1.07% postgres postgres [.] ExprEvalPushStep
1.04% postgres postgres [.] ExecInitExprRec
0.95% postgres plpgsql.so [.] plpgsql_estate_setup
0.91% postgres postgres [.] ExecReadyInterpretedExpand from patched7.08% postgres postgres [.] hash_bytes
6.25% postgres postgres [.] palloc0
3.52% postgres postgres [.] SearchCatCacheInternal
3.30% postgres postgres [.] AllocSetAlloc
2.39% postgres postgres [.] expression_tree_walker_impl
2.37% postgres plpgsql.so [.] exec_stmts
2.15% postgres postgres [.] _SPI_execute_plan
2.10% postgres postgres [.] CheckExprStillValid
1.94% postgres postgres [.] fmgr_info_cxt_security
1.71% postgres postgres [.] ExecInitFunc
1.41% postgres postgres [.] AllocSetReset
1.40% postgres postgres [.] ExecInterpExpr
1.38% postgres postgres [.] ExprEvalPushStep
1.34% postgres postgres [.] ResourceOwnerForget
1.31% postgres postgres [.] MemoryContextDelete
1.24% postgres libc.so.6 [.] __memcmp_sse2
1.21% postgres postgres [.] MemoryContextCreate
1.18% postgres postgres [.] AllocSetContextCreateInternal
1.17% postgres postgres [.] hash_search_with_hash_value
1.13% postgres postgres [.] resolve_polymorphic_argtypes
1.13% postgres plpgsql.so [.] plpgsql_exec_function
1.03% postgres postgres [.] standard_ExecutorStart
0.98% postgres postgres [.] ExecInitExprRec
0.96% postgres postgres [.] check_stack_depthlooks so there is only one significant differencesExprEvalPushStep 1.07 x 1.38%RegardsPavelcompiled without assertions on gcc 15 with 02vendor_id : GenuineIntel
cpu family : 6
model : 42
model name : Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
stepping : 7
microcode : 0x2f
cpu MHz : 2691.102
cache size : 6144 KB
I tested the patches on another notebook with more recent cpu
vendor_id : GenuineIntel
cpu family : 6
model : 154
model name : 12th Gen Intel(R) Core(TM) i7-12700H
stepping : 3
microcode : 0x436
cpu MHz : 400.000
cache size : 24576 KB
cpu family : 6
model : 154
model name : 12th Gen Intel(R) Core(TM) i7-12700H
stepping : 3
microcode : 0x436
cpu MHz : 400.000
cache size : 24576 KB
And the difference are smaller - about 3%
Regards
Pavel
regards, tom lane
[1] https://www.postgresql.org/message-id/e5724d1ba8398c7ff20ead1de73b4db4%40postgrespro.ru