There are a couple of function call overheads I observed in pl/pgsql
code : exec_stmt() and exec_cast_value(). Removing these overheads
resulted in some performance gains.
exec_stmt() :
plpgsql_exec_function() and other toplevel block executors currently
call exec_stmt(). But actually they don't need to do everything that
exec_stmt() does. So they can call a new function instead of
exec_stmt(), and all the exec_stmt() code can be moved to
exec_stmts(). The things that exec_stmt() do, but are not necessary
for a top level block stmt, are :
1. save_estmt = estate->err_stmt; estate->err_stmt = stmt;
For top level blocks, saving the estate->err_stmt is not necessary,
because there is no statement after this block statement. Anyways,
plpgsql_exec_function() assigns estate.err_stmt just before calling
exec_stmt so there is really no point in exec_stmt() setting it again.
2. CHECK_FOR_INTERRUPTS()
This is not necessary for toplevel block callers.
3. exec_stmt_block() can be directly called rather than exec_stmt()
because func->action is a block statement. So the switch statement is
not necessary.
But this one might be necessary for toplevel block statement:
if (*plpgsql_plugin_ptr && (*plpgsql_plugin_ptr)->stmt_beg)
((*plpgsql_plugin_ptr)->stmt_beg) (estate, stmt);
There was already a repetitive code in plpgsql_exec_function() and
other functions around the exec_stmt() call. So in a separate patch
0001*.patch, I moved that code into a common function
exec_toplevel_block(). In the main patch
0002-Get-rid-of-exec_stmt-function-call.patch, I additionally called
plpgsql_plugin_ptr->stmt_beg() inside exec_toplevel_block(). And moved
exec_stmt() code into exec_stmts().
exec_cast_value() :
This function does not do the casting if not required. So moved the
code that actually does the cast into a separate function, so as to
reduce the exec_cast_value() code and make it inline. Attached is the
0003-Inline-exec_cast_value.patch
Testing
----------
I used two available VMs (one x86_64 and the other arm64), and the
benefit showed up on both of these machines. Attached patches 0001,
0002, 0003 are to be applied in that order. 0001 is just a preparatory
patch.
First I tried with a simple for loop with a single assignment
(attached forcounter.sql)
By inlining of the two functions, found noticeable reduction in
execution time as shown (figures are in milliseconds, averaged over
multiple runs; taken from 'explain analyze' execution times) :
ARM VM :
HEAD : 100 ; Patched : 88 => 13.6% improvement
x86 VM :
HEAD : 71 ; Patched : 66 => 7.63% improvement.
Then I included many assignment statements as shown in attachment
assignmany.sql. This showed further benefit :
ARM VM :
HEAD : 1820 ; Patched : 1549 => 17.5% improvement
x86 VM :
HEAD : 1020 ; Patched : 869 => 17.4% improvement
Inlining just exec_stmt() showed the improvement mainly on the arm64
VM (7.4%). For x86, it was 2.7%
But inlining exec_stmt() and exec_cast_value() together showed
benefits on both machines, as can be seen above.
--
Thanks,
-Amit Khandekar
Huawei Technologies