Thread: Inlining of couple of functions in pl_exec.c improves performance
There are a couple of function call overheads I observed in pl/pgsql code : exec_stmt() and exec_cast_value(). Removing these overheads resulted in some performance gains. exec_stmt() : plpgsql_exec_function() and other toplevel block executors currently call exec_stmt(). But actually they don't need to do everything that exec_stmt() does. So they can call a new function instead of exec_stmt(), and all the exec_stmt() code can be moved to exec_stmts(). The things that exec_stmt() do, but are not necessary for a top level block stmt, are : 1. save_estmt = estate->err_stmt; estate->err_stmt = stmt; For top level blocks, saving the estate->err_stmt is not necessary, because there is no statement after this block statement. Anyways, plpgsql_exec_function() assigns estate.err_stmt just before calling exec_stmt so there is really no point in exec_stmt() setting it again. 2. CHECK_FOR_INTERRUPTS() This is not necessary for toplevel block callers. 3. exec_stmt_block() can be directly called rather than exec_stmt() because func->action is a block statement. So the switch statement is not necessary. But this one might be necessary for toplevel block statement: if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg) ((*plpgsql_plugin_ptr)->stmt_beg) (estate, stmt); There was already a repetitive code in plpgsql_exec_function() and other functions around the exec_stmt() call. So in a separate patch 0001*.patch, I moved that code into a common function exec_toplevel_block(). In the main patch 0002-Get-rid-of-exec_stmt-function-call.patch, I additionally called plpgsql_plugin_ptr->stmt_beg() inside exec_toplevel_block(). And moved exec_stmt() code into exec_stmts(). exec_cast_value() : This function does not do the casting if not required. So moved the code that actually does the cast into a separate function, so as to reduce the exec_cast_value() code and make it inline. Attached is the 0003-Inline-exec_cast_value.patch Testing ---------- I used two available VMs (one x86_64 and the other arm64), and the benefit showed up on both of these machines. Attached patches 0001, 0002, 0003 are to be applied in that order. 0001 is just a preparatory patch. First I tried with a simple for loop with a single assignment (attached forcounter.sql) By inlining of the two functions, found noticeable reduction in execution time as shown (figures are in milliseconds, averaged over multiple runs; taken from 'explain analyze' execution times) : ARM VM : HEAD : 100 ; Patched : 88 => 13.6% improvement x86 VM : HEAD : 71 ; Patched : 66 => 7.63% improvement. Then I included many assignment statements as shown in attachment assignmany.sql. This showed further benefit : ARM VM : HEAD : 1820 ; Patched : 1549 => 17.5% improvement x86 VM : HEAD : 1020 ; Patched : 869 => 17.4% improvement Inlining just exec_stmt() showed the improvement mainly on the arm64 VM (7.4%). For x86, it was 2.7% But inlining exec_stmt() and exec_cast_value() together showed benefits on both machines, as can be seen above. -- Thanks, -Amit Khandekar Huawei Technologies
Attachment
Hi
so 23. 5. 2020 v 19:03 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
There are a couple of function call overheads I observed in pl/pgsql
code : exec_stmt() and exec_cast_value(). Removing these overheads
resulted in some performance gains.
exec_stmt() :
plpgsql_exec_function() and other toplevel block executors currently
call exec_stmt(). But actually they don't need to do everything that
exec_stmt() does. So they can call a new function instead of
exec_stmt(), and all the exec_stmt() code can be moved to
exec_stmts(). The things that exec_stmt() do, but are not necessary
for a top level block stmt, are :
1. save_estmt = estate->err_stmt; estate->err_stmt = stmt;
For top level blocks, saving the estate->err_stmt is not necessary,
because there is no statement after this block statement. Anyways,
plpgsql_exec_function() assigns estate.err_stmt just before calling
exec_stmt so there is really no point in exec_stmt() setting it again.
2. CHECK_FOR_INTERRUPTS()
This is not necessary for toplevel block callers.
3. exec_stmt_block() can be directly called rather than exec_stmt()
because func->action is a block statement. So the switch statement is
not necessary.
But this one might be necessary for toplevel block statement:
if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg)
((*plpgsql_plugin_ptr)->stmt_beg) (estate, stmt);
There was already a repetitive code in plpgsql_exec_function() and
other functions around the exec_stmt() call. So in a separate patch
0001*.patch, I moved that code into a common function
exec_toplevel_block(). In the main patch
0002-Get-rid-of-exec_stmt-function-call.patch, I additionally called
plpgsql_plugin_ptr->stmt_beg() inside exec_toplevel_block(). And moved
exec_stmt() code into exec_stmts().
exec_cast_value() :
This function does not do the casting if not required. So moved the
code that actually does the cast into a separate function, so as to
reduce the exec_cast_value() code and make it inline. Attached is the
0003-Inline-exec_cast_value.patch
Testing
----------
I used two available VMs (one x86_64 and the other arm64), and the
benefit showed up on both of these machines. Attached patches 0001,
0002, 0003 are to be applied in that order. 0001 is just a preparatory
patch.
First I tried with a simple for loop with a single assignment
(attached forcounter.sql)
By inlining of the two functions, found noticeable reduction in
execution time as shown (figures are in milliseconds, averaged over
multiple runs; taken from 'explain analyze' execution times) :
ARM VM :
HEAD : 100 ; Patched : 88 => 13.6% improvement
x86 VM :
HEAD : 71 ; Patched : 66 => 7.63% improvement.
Then I included many assignment statements as shown in attachment
assignmany.sql. This showed further benefit :
ARM VM :
HEAD : 1820 ; Patched : 1549 => 17.5% improvement
x86 VM :
HEAD : 1020 ; Patched : 869 => 17.4% improvement
Inlining just exec_stmt() showed the improvement mainly on the arm64
VM (7.4%). For x86, it was 2.7%
But inlining exec_stmt() and exec_cast_value() together showed
benefits on both machines, as can be seen above.
FOR counter IN 1..1800000 LOOP
id = 0; id = 0; id1 = 0;
id2 = 0; id3 = 0; id1 = 0; id2 = 0;
id3 = 0; id = 0; id = 0; id1 = 0;
id2 = 0; id3 = 0; id1 = 0; id2 = 0;
id3 = 0;
END LOOP;
This is not too much typical PLpgSQL code. All expressions are not parametrized - so this test is little bit obscure.
Last strange performance plpgsql benchmark did calculation of pi value. It does something real
Regards
Pavel
--
Thanks,
-Amit Khandekar
Huawei Technologies
On Sat, 23 May 2020 at 23:24, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > FOR counter IN 1..1800000 LOOP > id = 0; id = 0; id1 = 0; > id2 = 0; id3 = 0; id1 = 0; id2 = 0; > id3 = 0; id = 0; id = 0; id1 = 0; > id2 = 0; id3 = 0; id1 = 0; id2 = 0; > id3 = 0; > END LOOP; > > This is not too much typical PLpgSQL code. All expressions are not parametrized - so this test is little bit obscure. > > Last strange performance plpgsql benchmark did calculation of pi value. It does something real Yeah, basically I wanted to have many statements, and that too with many assignments where casts are not required. Let me check if I can come up with a real-enough testcase. Thanks.
On Tue, 26 May 2020 at 09:06, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > On Sat, 23 May 2020 at 23:24, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > > FOR counter IN 1..1800000 LOOP > > id = 0; id = 0; id1 = 0; > > id2 = 0; id3 = 0; id1 = 0; id2 = 0; > > id3 = 0; id = 0; id = 0; id1 = 0; > > id2 = 0; id3 = 0; id1 = 0; id2 = 0; > > id3 = 0; > > END LOOP; > > > > This is not too much typical PLpgSQL code. All expressions are not parametrized - so this test is little bit obscure. > > > > Last strange performance plpgsql benchmark did calculation of pi value. It does something real > > Yeah, basically I wanted to have many statements, and that too with > many assignments where casts are not required. Let me check if I can > come up with a real-enough testcase. Thanks. create table tab (id int[]); insert into tab select array((select ((random() * 100000)::bigint) id from generate_series(1, 30000) order by 1)); insert into tab select array((select ((random() * 600000)::bigint) id from generate_series(1, 30000) order by 1)); insert into tab select array((select ((random() * 1000000)::bigint) id from generate_series(1, 30000) order by 1)); insert into tab select array((select ((random() * 100000)::bigint) id from generate_series(1, 30000) order by 1)); insert into tab select array((select ((random() * 600000)::bigint) id from generate_series(1, 30000) order by 1)); insert into tab select array((select ((random() * 1000000)::bigint) id from generate_series(1, 30000) order by 1)); insert into tab select array((select ((random() * 100000)::bigint) id from generate_series(1, 30000) order by 1)); insert into tab select array((select ((random() * 600000)::bigint) id from generate_series(1, 30000) order by 1)); insert into tab select array((select ((random() * 1000000)::bigint) id from generate_series(1, 30000) order by 1)); -- Return how much two consecutive array elements are apart from each other, on average; i.e. how much the numbers are spaced out. -- Input is an ordered array of integers. CREATE OR REPLACE FUNCTION avg_space(int[]) RETURNS bigint AS $$ DECLARE diff int = 0; num int; prevnum int = 1; BEGIN FOREACH num IN ARRAY $1 LOOP diff = diff + num - prevnum; prevnum = num; END LOOP; RETURN diff/array_length($1, 1); END; $$ LANGUAGE plpgsql; explain analyze select avg_space(id) from tab; Like earlier figures, these are execution times in milliseconds, taken from explain-analyze. ARM VM: HEAD : 49.8 patch 0001+0002 : 47.8 => 4.2% patch 0001+0002+0003 : 42.9 => 16.1% x86 VM: HEAD : 32.8 patch 0001+0002 : 32.7 => 0% patch 0001+0002+0003 : 28.0 => 17.1%
Hi
st 27. 5. 2020 v 13:31 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
On Tue, 26 May 2020 at 09:06, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>
> On Sat, 23 May 2020 at 23:24, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> >
> > FOR counter IN 1..1800000 LOOP
> > id = 0; id = 0; id1 = 0;
> > id2 = 0; id3 = 0; id1 = 0; id2 = 0;
> > id3 = 0; id = 0; id = 0; id1 = 0;
> > id2 = 0; id3 = 0; id1 = 0; id2 = 0;
> > id3 = 0;
> > END LOOP;
> >
> > This is not too much typical PLpgSQL code. All expressions are not parametrized - so this test is little bit obscure.
> >
> > Last strange performance plpgsql benchmark did calculation of pi value. It does something real
>
> Yeah, basically I wanted to have many statements, and that too with
> many assignments where casts are not required. Let me check if I can
> come up with a real-enough testcase. Thanks.
create table tab (id int[]);
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 100000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 600000)::bigint) id
from generate_series(1, 30000) order by 1));
insert into tab select array((select ((random() * 1000000)::bigint) id
from generate_series(1, 30000) order by 1));
-- Return how much two consecutive array elements are apart from each
other, on average; i.e. how much the numbers are spaced out.
-- Input is an ordered array of integers.
CREATE OR REPLACE FUNCTION avg_space(int[]) RETURNS bigint AS $$
DECLARE
diff int = 0;
num int;
prevnum int = 1;
BEGIN
FOREACH num IN ARRAY $1
LOOP
diff = diff + num - prevnum;
prevnum = num;
END LOOP;
RETURN diff/array_length($1, 1);
END;
$$ LANGUAGE plpgsql;
explain analyze select avg_space(id) from tab;
Like earlier figures, these are execution times in milliseconds, taken
from explain-analyze.
ARM VM:
HEAD : 49.8
patch 0001+0002 : 47.8 => 4.2%
patch 0001+0002+0003 : 42.9 => 16.1%
x86 VM:
HEAD : 32.8
patch 0001+0002 : 32.7 => 0%
patch 0001+0002+0003 : 28.0 => 17.1%
I tested these patches on my notebook - Lenovo T520 (x64) - on pi calculation
CREATE OR REPLACE FUNCTION pi_est_1(n int)
RETURNS numeric AS $$
DECLARE
accum double precision DEFAULT 1.0;
c1 double precision DEFAULT 2.0;
c2 double precision DEFAULT 1.0;
BEGIN
FOR i IN 1..n
LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + 2.0)));
c1 := c1 + 2.0;
c2 := c2 + 2.0;
END LOOP;
RETURN accum * 2.0;
END;
$$ LANGUAGE plpgsql;
RETURNS numeric AS $$
DECLARE
accum double precision DEFAULT 1.0;
c1 double precision DEFAULT 2.0;
c2 double precision DEFAULT 1.0;
BEGIN
FOR i IN 1..n
LOOP
accum := accum * ((c1 * c1) / (c2 * (c2 + 2.0)));
c1 := c1 + 2.0;
c2 := c2 + 2.0;
END LOOP;
RETURN accum * 2.0;
END;
$$ LANGUAGE plpgsql;
and I see about 3-5% of speedup
extra simply test shows
do $$ declare i int default 0; begin while i < 100000000 loop i := i + 1; end loop; raise notice 'i=%', i;end $$;
2% speedup
I don't see 17% anywhere, but 3-5% is not bad.
patch 0001 has sense and can help with code structure
patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct.
patch 0003 has sense too
tested on Fedora 32 with gcc 10.1.1 and -O2 option
Regards
Pavel
On Thu, 28 May 2020 at 14:39, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I don't see 17% anywhere, but 3-5% is not bad. Did you see 3-5% only for the pi function, or did you see the same improvement also for the functions that I wrote ? I was getting a consistent result of 14-18 % on both of the VMs. Also, is your test machine running on Windows ? All the machines I tested were on Linux kernel (Ubuntu) Below are my results for your pi_est_1() function. For this function, I am consistently getting 5-9 % improvement. I tested on 3 machines : gcc : 8.4.0. -O2 option OS : Ubuntu Bionic explain analyze select pi_est_1(10000000) 1. x86_64 laptop VM (Intel Core i7-8665U) HEAD : 2666 2617 2600 2630 Patched : 2502 2409 2460 2444 2. x86_64 VM (Xeon Gold 6151) HEAD : 1664 1662 1721 1660 Patched : 1541 1548 1537 1526 3. ARM64 VM (Kunpeng) HEAD : 2873 2864 2860 2861 Patched : 2568 2513 2501 2538 > > patch 0001 has sense and can help with code structure > patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct. Here, I moved the exec_stmt code into exec_stmts() function because exec_stmts() was the only caller, and that function is not that big. I am assuming you were referring to this point when you said it is a bit against simplicity. But I didn't get what you implied by "but for PLpgSQL with blocks structure it is correct" -- Thanks, -Amit Khandekar Huawei Technologies
so 30. 5. 2020 v 7:28 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
On Thu, 28 May 2020 at 14:39, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I don't see 17% anywhere, but 3-5% is not bad.
Did you see 3-5% only for the pi function, or did you see the same
improvement also for the functions that I wrote ? I was getting a
consistent result of 14-18 % on both of the VMs. Also, is your test
machine running on Windows ? All the machines I tested were on Linux
kernel (Ubuntu)
It was similar with your example too.
I tested it on Linux Fedora Core 32 - laptop T520 - I7.
I think so the effect of these patches strongly depends on CPU and compiler - but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere.
Below are my results for your pi_est_1() function. For this function,
I am consistently getting 5-9 % improvement. I tested on 3 machines :
gcc : 8.4.0. -O2 option
OS : Ubuntu Bionic
explain analyze select pi_est_1(10000000)
1. x86_64 laptop VM (Intel Core i7-8665U)
HEAD : 2666 2617 2600 2630
Patched : 2502 2409 2460 2444
2. x86_64 VM (Xeon Gold 6151)
HEAD : 1664 1662 1721 1660
Patched : 1541 1548 1537 1526
3. ARM64 VM (Kunpeng)
HEAD : 2873 2864 2860 2861
Patched : 2568 2513 2501 2538
>
> patch 0001 has sense and can help with code structure
> patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct.
Here, I moved the exec_stmt code into exec_stmts() function because
exec_stmts() was the only caller, and that function is not that big. I
am assuming you were referring to this point when you said it is a bit
against simplicity. But I didn't get what you implied by "but for
PLpgSQL with blocks structure it is correct"
Nested statement in PLpgSQL is always a list of statements. It is not single statement ever. So is not too strange don't have a function execute_stmt.
Pavel
--
Thanks,
-Amit Khandekar
Huawei Technologies
On Sat, May 23, 2020 at 10:33:43PM +0530, Amit Khandekar wrote: > By inlining of the two functions, found noticeable reduction in > execution time as shown (figures are in milliseconds, averaged over > multiple runs; taken from 'explain analyze' execution times) : > ARM VM : > HEAD : 100 ; Patched : 88 => 13.6% improvement > x86 VM : > HEAD : 71 ; Patched : 66 => 7.63% improvement. > > Then I included many assignment statements as shown in attachment > assignmany.sql. This showed further benefit : > ARM VM : > HEAD : 1820 ; Patched : 1549 => 17.5% improvement > x86 VM : > HEAD : 1020 ; Patched : 869 => 17.4% improvement > > Inlining just exec_stmt() showed the improvement mainly on the arm64 > VM (7.4%). For x86, it was 2.7% > But inlining exec_stmt() and exec_cast_value() together showed > benefits on both machines, as can be seen above. This stuff is interesting. Do you have some perf profiles to share? I am wondering what's the effect of the inlining with your test cases. -- Michael
Attachment
On Sun, 31 May 2020 at 08:04, Michael Paquier <michael@paquier.xyz> wrote: > This stuff is interesting. Do you have some perf profiles to share? > I am wondering what's the effect of the inlining with your test > cases. Below are the perf numbers for asignmany.sql : HEAD : + 16.88% postgres postgres [.] CachedPlanIsSimplyValid + 16.64% postgres plpgsql.so [.] exec_stmt + 15.56% postgres plpgsql.so [.] exec_eval_expr + 13.58% postgres plpgsql.so [.] exec_assign_value + 7.49% postgres plpgsql.so [.] exec_cast_value + 7.17% postgres plpgsql.so [.] exec_assign_expr + 5.39% postgres postgres [.] MemoryContextReset + 3.91% postgres postgres [.] ExecJustConst + 3.33% postgres postgres [.] recomputeNamespacePath + 2.88% postgres postgres [.] OverrideSearchPathMatchesCurrent + 2.18% postgres plpgsql.so [.] exec_eval_cleanup.isra.17 + 2.15% postgres plpgsql.so [.] exec_stmts + 1.32% postgres plpgsql.so [.] MemoryContextReset@plt + 0.57% postgres plpgsql.so [.] CachedPlanIsSimplyValid@plt + 0.57% postgres postgres [.] GetUserId 0.30% postgres plpgsql.so [.] assign_simple_var.isra.13 0.05% postgres [kernel.kallsyms] [k] unmap_page_range Patched : + 18.22% postgres postgres [.] CachedPlanIsSimplyValid + 17.25% postgres plpgsql.so [.] exec_eval_expr + 16.31% postgres plpgsql.so [.] exec_stmts + 15.00% postgres plpgsql.so [.] exec_assign_value + 7.56% postgres plpgsql.so [.] exec_assign_expr + 5.64% postgres postgres [.] MemoryContextReset + 5.16% postgres postgres [.] ExecJustConst + 4.86% postgres postgres [.] recomputeNamespacePath + 4.54% postgres postgres [.] OverrideSearchPathMatchesCurrent + 2.33% postgres plpgsql.so [.] exec_eval_cleanup.isra.17 + 1.26% postgres plpgsql.so [.] MemoryContextReset@plt + 0.81% postgres postgres [.] GetUserId + 0.71% postgres plpgsql.so [.] CachedPlanIsSimplyValid@plt 0.26% postgres plpgsql.so [.] assign_simple_var.isra.13 0.03% postgres [kernel.kallsyms] [k] unmap_page_range 0.02% postgres [kernel.kallsyms] [k] mark_page_accessed Notice the reduction in percentages : HEAD : exec_stmts + exec_stmt = 18.79 Patched : exec_stmts = 16.31 HEAD : exec_assign_value + exec_cast_value : 21.07 Patched : exec_assign_value = 15.00 As expected, reduction of percentage in these two functions caused other functions like CachedPlanIsSimplyValid() and exec_eval_expr() to show rise in their percentages.
On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com> wrote: > I think so the effect of these patches strongly depends on CPU and compile I quickly tried pi() with gcc 10 as well, and saw more or less the same benefit. I think, we are bound to see some differences in the benefits across architectures, kernels and compilers; but looks like some benefit is always there. > but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere. Please check the perf numbers in my reply to Michael. I suppose you meant CachedPlanIsSimplyValid() when you say the bottle neck is elsewhere ? Yeah, this function is always the hottest spot, which I recall is being discussed elsewhere. But I always see exec_stmt(), exec_assign_value as the next functions. >> > patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct. >> >> Here, I moved the exec_stmt code into exec_stmts() function because >> exec_stmts() was the only caller, and that function is not that big. I >> am assuming you were referring to this point when you said it is a bit >> against simplicity. But I didn't get what you implied by "but for >> PLpgSQL with blocks structure it is correct" > > > Nested statement in PLpgSQL is always a list of statements. It is not single statement ever. So is not too strange don'thave a function execute_stmt. Right.
po 1. 6. 2020 v 8:15 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> I think so the effect of these patches strongly depends on CPU and compile
I quickly tried pi() with gcc 10 as well, and saw more or less the
same benefit. I think, we are bound to see some differences in the
benefits across architectures, kernels and compilers; but looks like
some benefit is always there.
> but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere.
Please check the perf numbers in my reply to Michael. I suppose you
meant CachedPlanIsSimplyValid() when you say the bottle neck is
elsewhere ? Yeah, this function is always the hottest spot, which I
recall is being discussed elsewhere. But I always see exec_stmt(),
exec_assign_value as the next functions.
It is hard to read the profile result, because these functions are nested together. For your example
18.22% postgres postgres [.] CachedPlanIsSimplyValid
Is little bit strange, and probably this is real bottleneck in your simple example, and maybe some work can be done there, because you assign just constant.
On second hand, your example is pretty unrealistic - and against any developer's best practices for writing cycles.
I think so we can look on PostGIS, where is some computing heavy routines in PLpgSQL, and we can look on real profiles.
Probably the most people will have benefit from these optimization.
>> > patch 0002 it is little bit against simplicity, but for PLpgSQL with blocks structure it is correct.
>>
>> Here, I moved the exec_stmt code into exec_stmts() function because
>> exec_stmts() was the only caller, and that function is not that big. I
>> am assuming you were referring to this point when you said it is a bit
>> against simplicity. But I didn't get what you implied by "but for
>> PLpgSQL with blocks structure it is correct"
>
>
> Nested statement in PLpgSQL is always a list of statements. It is not single statement ever. So is not too strange don't have a function execute_stmt.
Right.
On Mon, 1 Jun 2020 at 12:27, Pavel Stehule <pavel.stehule@gmail.com> wrote: > po 1. 6. 2020 v 8:15 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal: >> >> On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com> wrote: >> > I think so the effect of these patches strongly depends on CPU and compile >> >> I quickly tried pi() with gcc 10 as well, and saw more or less the >> same benefit. I think, we are bound to see some differences in the >> benefits across architectures, kernels and compilers; but looks like >> some benefit is always there. >> >> > but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere. >> >> Please check the perf numbers in my reply to Michael. I suppose you >> meant CachedPlanIsSimplyValid() when you say the bottle neck is >> elsewhere ? Yeah, this function is always the hottest spot, which I >> recall is being discussed elsewhere. But I always see exec_stmt(), >> exec_assign_value as the next functions. > > > It is hard to read the profile result, because these functions are nested together. For your example > > 18.22% postgres postgres [.] CachedPlanIsSimplyValid > > Is little bit strange, and probably this is real bottleneck in your simple example, and maybe some work can be done there,because you assign just constant. I had earlier had a quick look on this one. CachedPlanIsSimplyValid() was, I recall, hitting a hotspot when it tries to access plansource->search_path (possibly cacheline miss). But didn't get a chance to further dig on that. For now, i am focusing on these other functions for which the patches were submitted. > > On second hand, your example is pretty unrealistic - and against any developer's best practices for writing cycles. > > I think so we can look on PostGIS, where is some computing heavy routines in PLpgSQL, and we can look on real profiles. > > Probably the most people will have benefit from these optimization. I understand it's not a real world example. For generating perf figures, I had to use an example which amplifies the benefits, so that the effect of the patches on the perf figures also becomes visible. Hence, used that example. I had shown the benefits up-thread using a practical function avg_space(). But the perf figures for that example were varying a lot. So below, what I did was : Run the avg_space() ~150 times, and took perf report. This stabilized the results a bit : HEAD : + 16.10% 17.29% 16.82% postgres postgres [.] ExecInterpExpr + 13.80% 13.56% 14.49% postgres plpgsql.so [.] exec_assign_value + 12.64% 12.10% 12.09% postgres plpgsql.so [.] plpgsql_param_eval_var + 12.15% 11.28% 11.05% postgres plpgsql.so [.] exec_stmt + 10.81% 10.24% 10.55% postgres plpgsql.so [.] exec_eval_expr + 9.50% 9.35% 9.37% postgres plpgsql.so [.] exec_cast_value ..... + 1.19% 1.06% 1.21% postgres plpgsql.so [.] exec_stmts 0001+0002 patches applied (i.e. inline exec_stmt) : + 16.90% 17.20% 16.54% postgres postgres [.] ExecInterpExpr + 16.42% 15.37% 15.28% postgres plpgsql.so [.] exec_assign_value + 11.34% 11.92% 11.93% postgres plpgsql.so [.] plpgsql_param_eval_var + 11.18% 11.86% 10.99% postgres plpgsql.so [.] exec_stmts.part.0 + 10.51% 9.52% 10.61% postgres plpgsql.so [.] exec_eval_expr + 9.39% 9.48% 9.30% postgres plpgsql.so [.] exec_cast_value HEAD : exec_stmts + exec_stmt = ~12.7 % Patched (0001+0002): exec_stmts = 11.3 % Just 0003 patch applied (i.e. inline exec_cast_value) : + 17.00% 16.77% 17.09% postgres postgres [.] ExecInterpExpr + 15.21% 15.64% 15.09% postgres plpgsql.so [.] exec_assign_value + 14.48% 14.06% 13.94% postgres plpgsql.so [.] exec_stmt + 13.26% 13.30% 13.14% postgres plpgsql.so [.] plpgsql_param_eval_var + 11.48% 11.64% 12.66% postgres plpgsql.so [.] exec_eval_expr .... + 1.03% 0.85% 0.87% postgres plpgsql.so [.] exec_stmts HEAD : exec_assign_value + exec_cast_value = ~23.4 % Patched (0001+0002): exec_assign_value = 15.3% Time in milliseconds after calling avg_space() 150 times : HEAD : 7210 Patch 0001+0002 : 6925 Patch 0003 : 6670 Patch 0001+0002+0003 : 6346
po 1. 6. 2020 v 15:59 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
On Mon, 1 Jun 2020 at 12:27, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> po 1. 6. 2020 v 8:15 odesílatel Amit Khandekar <amitdkhan.pg@gmail.com> napsal:
>>
>> On Sat, 30 May 2020 at 11:11, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> > I think so the effect of these patches strongly depends on CPU and compile
>>
>> I quickly tried pi() with gcc 10 as well, and saw more or less the
>> same benefit. I think, we are bound to see some differences in the
>> benefits across architectures, kernels and compilers; but looks like
>> some benefit is always there.
>>
>> > but it is micro optimization, and when I look to profiler, the bottle neck is elsewhere.
>>
>> Please check the perf numbers in my reply to Michael. I suppose you
>> meant CachedPlanIsSimplyValid() when you say the bottle neck is
>> elsewhere ? Yeah, this function is always the hottest spot, which I
>> recall is being discussed elsewhere. But I always see exec_stmt(),
>> exec_assign_value as the next functions.
>
>
> It is hard to read the profile result, because these functions are nested together. For your example
>
> 18.22% postgres postgres [.] CachedPlanIsSimplyValid
>
> Is little bit strange, and probably this is real bottleneck in your simple example, and maybe some work can be done there, because you assign just constant.
I had earlier had a quick look on this one. CachedPlanIsSimplyValid()
was, I recall, hitting a hotspot when it tries to access
plansource->search_path (possibly cacheline miss). But didn't get a
chance to further dig on that. For now, i am focusing on these other
functions for which the patches were submitted.
>
> On second hand, your example is pretty unrealistic - and against any developer's best practices for writing cycles.
>
> I think so we can look on PostGIS, where is some computing heavy routines in PLpgSQL, and we can look on real profiles.
>
> Probably the most people will have benefit from these optimization.
I understand it's not a real world example. For generating perf
figures, I had to use an example which amplifies the benefits, so that
the effect of the patches on the perf figures also becomes visible.
Hence, used that example. I had shown the benefits up-thread using a
practical function avg_space(). But the perf figures for that example
were varying a lot.
So below, what I did was : Run the avg_space() ~150 times, and took
perf report. This stabilized the results a bit :
HEAD :
+ 16.10% 17.29% 16.82% postgres postgres [.]
ExecInterpExpr
+ 13.80% 13.56% 14.49% postgres plpgsql.so [.]
exec_assign_value
+ 12.64% 12.10% 12.09% postgres plpgsql.so [.]
plpgsql_param_eval_var
+ 12.15% 11.28% 11.05% postgres plpgsql.so [.]
exec_stmt
+ 10.81% 10.24% 10.55% postgres plpgsql.so [.]
exec_eval_expr
+ 9.50% 9.35% 9.37% postgres plpgsql.so [.]
exec_cast_value
.....
+ 1.19% 1.06% 1.21% postgres plpgsql.so [.]
exec_stmts
0001+0002 patches applied (i.e. inline exec_stmt) :
+ 16.90% 17.20% 16.54% postgres postgres [.]
ExecInterpExpr
+ 16.42% 15.37% 15.28% postgres plpgsql.so [.]
exec_assign_value
+ 11.34% 11.92% 11.93% postgres plpgsql.so [.]
plpgsql_param_eval_var
+ 11.18% 11.86% 10.99% postgres plpgsql.so [.] exec_stmts.part.0
+ 10.51% 9.52% 10.61% postgres plpgsql.so [.]
exec_eval_expr
+ 9.39% 9.48% 9.30% postgres plpgsql.so [.]
exec_cast_value
HEAD : exec_stmts + exec_stmt = ~12.7 %
Patched (0001+0002): exec_stmts = 11.3 %
Just 0003 patch applied (i.e. inline exec_cast_value) :
+ 17.00% 16.77% 17.09% postgres postgres [.] ExecInterpExpr
+ 15.21% 15.64% 15.09% postgres plpgsql.so [.] exec_assign_value
+ 14.48% 14.06% 13.94% postgres plpgsql.so [.] exec_stmt
+ 13.26% 13.30% 13.14% postgres plpgsql.so [.]
plpgsql_param_eval_var
+ 11.48% 11.64% 12.66% postgres plpgsql.so [.] exec_eval_expr
....
+ 1.03% 0.85% 0.87% postgres plpgsql.so [.] exec_stmts
HEAD : exec_assign_value + exec_cast_value = ~23.4 %
Patched (0001+0002): exec_assign_value = 15.3%
Time in milliseconds after calling avg_space() 150 times :
HEAD : 7210
Patch 0001+0002 : 6925
Patch 0003 : 6670
Patch 0001+0002+0003 : 6346
Is your patch in commitfest in commitfest application?
Regards
Pavel
On Tue, 9 Jun 2020 at 21:49, Pavel Stehule <pavel.stehule@gmail.com> wrote: > Is your patch in commitfest in commitfest application? Thanks for reminding me. Just added. https://commitfest.postgresql.org/28/2590/
Amit Khandekar <amitdkhan.pg@gmail.com> writes: > There are a couple of function call overheads I observed in pl/pgsql > code : exec_stmt() and exec_cast_value(). Removing these overheads > resulted in some performance gains. I took a look at the 0001/0002 patches (not 0003 as yet). I do not like 0001 too much. The most concrete problem with it is that you broke translatability of the error messages, cf the first translatability guideline at [1]. While that could be fixed by passing the entire message not just part of it, I don't see anything that we're gaining by moving that stuff into exec_toplevel_block in the first place. Certainly, passing a string that describes what will happen *after* exec_toplevel_block is just weird. I think what you've got here is a very arbitrary chopping-up of the existing code based on chance similarities of the existing callers. I think we're better off to make exec_toplevel_block be as nearly as possible a match for exec_stmts' semantics. Hence, I propose the attached 0001 to replace 0001/0002. This should be basically indistinguishable performance-wise, though I have not tried to benchmark. Note that for reviewability's sake, I did not reindent the former body of exec_stmt, though we'd want to do that before committing. Also, 0002 is a small patch on top of that to avoid redundant saves and restores of estate->err_stmt within the loop in exec_stmts. This may well not be a measurable improvement, but it's a pretty obvious inefficiency in exec_stmts now that it's refactored this way. regards, tom lane [1] https://www.postgresql.org/docs/devel/nls-programmer.html#NLS-GUIDELINES diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index f41d675d65..b02567c88d 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -260,12 +260,12 @@ static MemoryContext get_stmt_mcontext(PLpgSQL_execstate *estate); static void push_stmt_mcontext(PLpgSQL_execstate *estate); static void pop_stmt_mcontext(PLpgSQL_execstate *estate); +static int exec_toplevel_block(PLpgSQL_execstate *estate, + PLpgSQL_stmt_block *block); static int exec_stmt_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block); static int exec_stmts(PLpgSQL_execstate *estate, List *stmts); -static int exec_stmt(PLpgSQL_execstate *estate, - PLpgSQL_stmt *stmt); static int exec_stmt_assign(PLpgSQL_execstate *estate, PLpgSQL_stmt_assign *stmt); static int exec_stmt_perform(PLpgSQL_execstate *estate, @@ -599,8 +599,7 @@ plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo, * Now call the toplevel block of statements */ estate.err_text = NULL; - estate.err_stmt = (PLpgSQL_stmt *) (func->action); - rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action); + rc = exec_toplevel_block(&estate, func->action); if (rc != PLPGSQL_RC_RETURN) { estate.err_stmt = NULL; @@ -1015,8 +1014,7 @@ plpgsql_exec_trigger(PLpgSQL_function *func, * Now call the toplevel block of statements */ estate.err_text = NULL; - estate.err_stmt = (PLpgSQL_stmt *) (func->action); - rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action); + rc = exec_toplevel_block(&estate, func->action); if (rc != PLPGSQL_RC_RETURN) { estate.err_stmt = NULL; @@ -1176,8 +1174,7 @@ plpgsql_exec_event_trigger(PLpgSQL_function *func, EventTriggerData *trigdata) * Now call the toplevel block of statements */ estate.err_text = NULL; - estate.err_stmt = (PLpgSQL_stmt *) (func->action); - rc = exec_stmt(&estate, (PLpgSQL_stmt *) func->action); + rc = exec_toplevel_block(&estate, func->action); if (rc != PLPGSQL_RC_RETURN) { estate.err_stmt = NULL; @@ -1584,6 +1581,38 @@ exception_matches_conditions(ErrorData *edata, PLpgSQL_condition *cond) } +/* ---------- + * exec_toplevel_block Execute the toplevel block + * + * This is intentionally equivalent to executing exec_stmts() with a + * list consisting of the one statement. One tiny difference is that + * we do not bother to save and restore estate->err_stmt; the caller + * is expected to clear that after we return. + * ---------- + */ +static int +exec_toplevel_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block) +{ + int rc; + + estate->err_stmt = (PLpgSQL_stmt *) block; + + /* Let the plugin know that we are about to execute this statement */ + if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg) + ((*plpgsql_plugin_ptr)->stmt_beg) (estate, (PLpgSQL_stmt *) block); + + CHECK_FOR_INTERRUPTS(); + + rc = exec_stmt_block(estate, block); + + /* Let the plugin know that we have finished executing this statement */ + if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_end) + ((*plpgsql_plugin_ptr)->stmt_end) (estate, (PLpgSQL_stmt *) block); + + return rc; +} + + /* ---------- * exec_stmt_block Execute a block of statements * ---------- @@ -1933,24 +1962,6 @@ exec_stmts(PLpgSQL_execstate *estate, List *stmts) foreach(s, stmts) { PLpgSQL_stmt *stmt = (PLpgSQL_stmt *) lfirst(s); - int rc = exec_stmt(estate, stmt); - - if (rc != PLPGSQL_RC_OK) - return rc; - } - - return PLPGSQL_RC_OK; -} - - -/* ---------- - * exec_stmt Distribute one statement to the statements - * type specific execution function. - * ---------- - */ -static int -exec_stmt(PLpgSQL_execstate *estate, PLpgSQL_stmt *stmt) -{ PLpgSQL_stmt *save_estmt; int rc = -1; @@ -2088,7 +2099,11 @@ exec_stmt(PLpgSQL_execstate *estate, PLpgSQL_stmt *stmt) estate->err_stmt = save_estmt; - return rc; + if (rc != PLPGSQL_RC_OK) + return rc; + } /* end of loop over statements */ + + return PLPGSQL_RC_OK; } diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index b02567c88d..8bc4a7a3d2 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -1946,6 +1946,7 @@ exec_stmt_block(PLpgSQL_execstate *estate, PLpgSQL_stmt_block *block) static int exec_stmts(PLpgSQL_execstate *estate, List *stmts) { + PLpgSQL_stmt *save_estmt = estate->err_stmt; ListCell *s; if (stmts == NIL) @@ -1962,10 +1963,8 @@ exec_stmts(PLpgSQL_execstate *estate, List *stmts) foreach(s, stmts) { PLpgSQL_stmt *stmt = (PLpgSQL_stmt *) lfirst(s); - PLpgSQL_stmt *save_estmt; int rc = -1; - save_estmt = estate->err_stmt; estate->err_stmt = stmt; /* Let the plugin know that we are about to execute this statement */ @@ -2097,12 +2096,14 @@ exec_stmts(PLpgSQL_execstate *estate, List *stmts) if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_end) ((*plpgsql_plugin_ptr)->stmt_end) (estate, stmt); - estate->err_stmt = save_estmt; - if (rc != PLPGSQL_RC_OK) + { + estate->err_stmt = save_estmt; return rc; + } } /* end of loop over statements */ + estate->err_stmt = save_estmt; return PLPGSQL_RC_OK; }
On Thu, 2 Jul 2020 at 03:47, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Amit Khandekar <amitdkhan.pg@gmail.com> writes: > > There are a couple of function call overheads I observed in pl/pgsql > > code : exec_stmt() and exec_cast_value(). Removing these overheads > > resulted in some performance gains. > > I took a look at the 0001/0002 patches (not 0003 as yet). I do not > like 0001 too much. The most concrete problem with it is that > you broke translatability of the error messages, cf the first > translatability guideline at [1]. Yeah, I thought we can safely use %s for proper nouns such as "trigger procedure" or "function" as those would not be translated. But looks like even if they won't be translated, the difference in word order among languages might create problems with this. > While that could be fixed by passing > the entire message not just part of it, I don't see anything that we're > gaining by moving that stuff into exec_toplevel_block in the first place. > Certainly, passing a string that describes what will happen *after* > exec_toplevel_block is just weird. I think what you've got here is > a very arbitrary chopping-up of the existing code based on chance > similarities of the existing callers. I think we're better off to make > exec_toplevel_block be as nearly as possible a match for exec_stmts' > semantics. I thought some of those things that I kept in exec_toplevel_block() do look like they belong to a top level function. But what you are saying also makes sense : better to keep it similar to exec_stmts. > > Hence, I propose the attached 0001 to replace 0001/0002. This should > be basically indistinguishable performance-wise, though I have not > tried to benchmark. Thanks for the patches. Yeah, performance-wise it does look similar; but anyways I tried running, and got similar performance numbers. > Note that for reviewability's sake, I did not > reindent the former body of exec_stmt, though we'd want to do that > before committing. Right. > > Also, 0002 is a small patch on top of that to avoid redundant saves > and restores of estate->err_stmt within the loop in exec_stmts. This > may well not be a measurable improvement, but it's a pretty obvious > inefficiency in exec_stmts now that it's refactored this way. 0002 also makes sense.
I did some performance testing on 0001+0002 here, and found that for me, there's basically no change on x86_64 but a win of 2 to 3 percent on aarch64, using Pavel's pi_est_1() as a representative case for simple plpgsql statements. That squares with your original results I believe. It's not clear to me whether any of the later tests in this thread measured these changes in isolation, or only with 0003 added. Anyway, that's good enough for me, so I pushed 0001+0002 after a little bit of additional cosmetic tweaking. I attach your original 0003 here (it still applies, with some line offsets). That's just so the cfbot doesn't get confused about what it's supposed to test now. regards, tom lane From 56aac7dff8243ff6dc9b8e72651cb1d9a018f1b3 Mon Sep 17 00:00:00 2001 From: Amit Khandekar <amitdkhan.pg@gmail.com> Date: Sat, 23 May 2020 21:53:24 +0800 Subject: [PATCH 3/3] Inline exec_cast_value(). This function does not do the casting if not required. So move the code that actually does the cast into a separate function, so as to reduce the exec_cast_value() code and make it inline. There are frequent calls of this function, so inlining it has shown to improve performance by as much as 14% --- src/pl/plpgsql/src/pl_exec.c | 63 ++++++++++++++++++++++++------------ 1 file changed, 43 insertions(+), 20 deletions(-) diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index d5377a6dad..4028a3f0f6 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -425,10 +425,14 @@ static void instantiate_empty_record_variable(PLpgSQL_execstate *estate, PLpgSQL_rec *rec); static char *convert_value_to_string(PLpgSQL_execstate *estate, Datum value, Oid valtype); -static Datum exec_cast_value(PLpgSQL_execstate *estate, +static inline Datum exec_cast_value(PLpgSQL_execstate *estate, Datum value, bool *isnull, Oid valtype, int32 valtypmod, Oid reqtype, int32 reqtypmod); +static Datum do_cast_value(PLpgSQL_execstate *estate, + Datum value, bool *isnull, + Oid valtype, int32 valtypmod, + Oid reqtype, int32 reqtypmod); static plpgsql_CastHashEntry *get_cast_hashentry(PLpgSQL_execstate *estate, Oid srctype, int32 srctypmod, Oid dsttype, int32 dsttypmod); @@ -7764,9 +7768,11 @@ convert_value_to_string(PLpgSQL_execstate *estate, Datum value, Oid valtype) * also contain the result Datum if we have to do a conversion to a pass- * by-reference data type. Be sure to do an exec_eval_cleanup() call when * done with the result. + * The actual code to cast is kept outside of this function, to keep it short + * since it is an inline function, being called frequently. * ---------- */ -static Datum +static inline Datum exec_cast_value(PLpgSQL_execstate *estate, Datum value, bool *isnull, Oid valtype, int32 valtypmod, @@ -7777,31 +7783,48 @@ exec_cast_value(PLpgSQL_execstate *estate, */ if (valtype != reqtype || (valtypmod != reqtypmod && reqtypmod != -1)) - { - plpgsql_CastHashEntry *cast_entry; + value = do_cast_value(estate, value, isnull, valtype, valtypmod, + reqtype, reqtypmod); - cast_entry = get_cast_hashentry(estate, - valtype, valtypmod, - reqtype, reqtypmod); - if (cast_entry) - { - ExprContext *econtext = estate->eval_econtext; - MemoryContext oldcontext; + return value; +} - oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate)); +/* ---------- + * do_cast_value cast the input value. + * + * Returns the cast value. + * Check comments in the wrapper function exec_cast_value(). + * ---------- + */ +static Datum +do_cast_value(PLpgSQL_execstate *estate, + Datum value, bool *isnull, + Oid valtype, int32 valtypmod, + Oid reqtype, int32 reqtypmod) +{ + plpgsql_CastHashEntry *cast_entry; - econtext->caseValue_datum = value; - econtext->caseValue_isNull = *isnull; + cast_entry = get_cast_hashentry(estate, + valtype, valtypmod, + reqtype, reqtypmod); + if (cast_entry) + { + ExprContext *econtext = estate->eval_econtext; + MemoryContext oldcontext; - cast_entry->cast_in_use = true; + oldcontext = MemoryContextSwitchTo(get_eval_mcontext(estate)); - value = ExecEvalExpr(cast_entry->cast_exprstate, econtext, - isnull); + econtext->caseValue_datum = value; + econtext->caseValue_isNull = *isnull; - cast_entry->cast_in_use = false; + cast_entry->cast_in_use = true; - MemoryContextSwitchTo(oldcontext); - } + value = ExecEvalExpr(cast_entry->cast_exprstate, econtext, + isnull); + + cast_entry->cast_in_use = false; + + MemoryContextSwitchTo(oldcontext); } return value; -- 2.17.1
I wrote: > I attach your original 0003 here (it still applies, with some line > offsets). That's just so the cfbot doesn't get confused about what > it's supposed to test now. Pushed that part now, too. BTW, the first test run I did on this (on x86_64) was actually several percent *slower* than HEAD. I couldn't reproduce that after restarting the postmaster; all later tests concurred that there was a speedup. So I suppose that was just some phase-of-the-moon effect, perhaps caused by an ASLR-dependent collision of bits of code in processor cache. Still, that illustrates the difficulty of getting useful, reproducible improvements when doing this kind of hacking. I tend to think that most of the time we're better off leaving this to the compiler. regards, tom lane
On Sat, 4 Jul 2020 at 01:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I did some performance testing on 0001+0002 here, and found that > for me, there's basically no change on x86_64 but a win of 2 to 3 > percent on aarch64, using Pavel's pi_est_1() as a representative > case for simple plpgsql statements. That squares with your original > results I believe. It's not clear to me whether any of the later > tests in this thread measured these changes in isolation, or only > with 0003 added. Yeah I had the same observation. 0001+0002 seems to benefit mostly on aarch64. And 0003 (exec_case_value) benefited both on amd64 and aarch64. > > Anyway, that's good enough for me, so I pushed 0001+0002 after a > little bit of additional cosmetic tweaking. > > I attach your original 0003 here (it still applies, with some line > offsets). That's just so the cfbot doesn't get confused about what > it's supposed to test now. Thanks for pushing all the three ! Thanks, -Amit Khandekar Huawei Technologies