Thread: patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL

Hi,

here is another patch related to using CALL statement inside PL/pgSQL code.

A repeated using of CALL statement is expensive. How much?

I wrote synthetic test:

CREATE TABLE foo(a int, b int, c int);

CREATE OR REPLACE PROCEDURE public.simple_proc3(a integer, b integer, c integer, cnt int, OUT r boolean)
AS $$
BEGIN
  INSERT INTO foo VALUES(a, b, c);
  IF cnt % 10000 = 0 THEN
    COMMIT;
    r := true;
  ELSE
    r := false;
  END IF;
END;
$$ LANGUAGE plpgsql;

DO $$
DECLARE a int; b int; c int; r boolean;
BEGIN
  TRUNCATE foo;
  FOR i IN 1..10000000
  LOOP
    a := (random() * 100)::int;
    b := (random() * 100)::int;
    c := (random() * 100)::int;
    CALL simple_proc3(a, b, c, i, r);
    IF r THEN
      RAISE NOTICE 'committed at % row', i;
    END IF;
  END LOOP;
END;
$$;

I try to insert 10M rows with commit after inserting 10K rows. Execution time on master is ~ 6 minutes 368251,691 ms (06:08,252)

DO $$
DECLARE a int; b int; c int; r boolean;
BEGIN
  TRUNCATE foo;
  FOR i IN 1..10000000
  LOOP
    a := (random() * 100)::int;
    b := (random() * 100)::int;
    c := (random() * 100)::int;
    INSERT INTO foo VALUES(a, b, c);
    IF i % 10000 = 0 THEN
      COMMIT;
      r := true;
    ELSE
      r := false;
    END IF;
    IF r THEN
      RAISE NOTICE 'committed at % row', i;
    END IF;
  END LOOP;
END;
$$;

When I try to remove CALL statement then same work needs less to 2 minutes 99109,511 ms (01:39,110). So this code is three times slower with calling one procedure. There are two significant parts of overhead:

a) internal implementation of CALL statement that doesn't use plan cache well, and it does lot of expensive operations over pg_proc catalogue,

b) wrapper in PL/pgSQL that repeatedly repearse expression string.

Overhead of PL/pgSQL can be reduced by using plan cache after fixing issue with resource owner. I did it, and I introduced "local_resowner" for holding references of plans for CALL statement expressions.

After patching the execution time is reduced to 4 minutes Time: 245852,846 ms (04:05,853). Still the overhead is significant, but it is 30% speedup.

The best case for this patch is about 2x times better performance

CREATE OR REPLACE PROCEDURE public.simple_proc2(a integer, b integer, c integer, cnt int, OUT r boolean)
AS $$
BEGIN
END;
$$ LANGUAGE plpgsql;

DO $$
DECLARE a int; r boolean;
BEGIN
  FOR i IN 1..10000000
  LOOP
    CALL simple_proc2((random()*100)::int, (random()*100)::int, (random()*100)::int, i, r);
  END LOOP;
END;
$$;

Time: 184667,970 ms (03:04,668), master: Time: 417463,457 ms (06:57,463)

On second hand, the worst case is about 10% (probably this overhead can be reduced by creating "local_resowner" only when it is necessary)

CREATE OR REPLACE FUNCTION simple_fx2(a int)
RETURNS int AS $$
BEGIN
  RETURN a + a;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

DO $$
DECLARE a int;
BEGIN
  FOR i IN 1..10000000
  LOOP
    a := simple_fx2(i);
  END LOOP;
END;
$$;

Time: 5434,808 ms (00:05,435) , master: Time: 4632,762 ms (00:04,633)

Comments, notes, ideas?

Regards

Pavel



Attachment
Hi

I played with the profiler a little bit to get some data - see attached file.  Almost all overhead is related to impossibility to use simple expression evaluation (that has very good performance now).

Probably there is no simple way to reduce this overhead. 

Regards

Pavel
Attachment


pá 1. 1. 2021 v 9:15 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

only rebase

rebase

regards

Pavel


Regards

Pavel
Attachment
Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ plpgsql-using-local-resowner-for-call-plans-20200108.patch ]

I took a quick look through this patch, just reading it without
any testing.  A few thoughts:

* Instead of adding an argument to GetCachedPlan and ReleaseCachedPlan,
I think it'd be better to *replace* the useResOwner bool with
a ResourceOwner pointer, with the obvious semantics "do nothing if
it's NULL".  Otherwise you have to explain what it means to pass NULL
with useResOwner = true.  In any case, changing the APIs of these
functions without updating their header comments is not okay.

* I'm not really happy with adding yet another nonorthogonal variant
of SPI_execute_plan.  Maybe better to do something like I did with
SPI_prepare_extended() in commit 844fe9f15, and create a struct with
all the inessential parameters so that we can make future API extensions
without inventing whole new functions.  Remember also that new SPI
functions have to be documented in spi.sgml.

* Do we really need a PG_TRY in exec_toplevel_block?  Not to mention
creating and destroying a ResOwner?  That seems pretty expensive, and it
should be unnecessary for ordinary plpgsql functions.  (I'm also unhappy
that you've utterly falsified that function's comment without doing
anything to update it.)  This is really the part that needs more
work.  I'm not sure that you can sell a speedup of CALL operations
if the penalty is to slow down every other plpgsql function.

* The part of the patch around exec_stmt_call is just about unreadable,
mainly because git diff seems to think that exec_stmt_call is being
changed into make_callstmt_target.  Maybe it'd be less messy if you
put make_callstmt_target after exec_stmt_call.

* Looks like an extra exec_prepare_plan() call snuck into
exec_assign_expr()?

            regards, tom lane



Hi

pá 15. 1. 2021 v 22:46 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ plpgsql-using-local-resowner-for-call-plans-20200108.patch ]

I took a quick look through this patch, just reading it without
any testing.  A few thoughts:

* Instead of adding an argument to GetCachedPlan and ReleaseCachedPlan,
I think it'd be better to *replace* the useResOwner bool with
a ResourceOwner pointer, with the obvious semantics "do nothing if
it's NULL".  Otherwise you have to explain what it means to pass NULL
with useResOwner = true.  In any case, changing the APIs of these
functions without updating their header comments is not okay.

done


* I'm not really happy with adding yet another nonorthogonal variant
of SPI_execute_plan.  Maybe better to do something like I did with
SPI_prepare_extended() in commit 844fe9f15, and create a struct with
all the inessential parameters so that we can make future API extensions
without inventing whole new functions.  Remember also that new SPI
functions have to be documented in spi.sgml.

done


* Do we really need a PG_TRY in exec_toplevel_block?  Not to mention
creating and destroying a ResOwner?  That seems pretty expensive, and it
should be unnecessary for ordinary plpgsql functions.  (I'm also unhappy
that you've utterly falsified that function's comment without doing
anything to update it.)  This is really the part that needs more
work.  I'm not sure that you can sell a speedup of CALL operations
if the penalty is to slow down every other plpgsql function.

I rewrote this part - now there is no new PG_TRY. local_resowner is created only when routine is executed in non atomic mode
 
* The part of the patch around exec_stmt_call is just about unreadable,
mainly because git diff seems to think that exec_stmt_call is being
changed into make_callstmt_target.  Maybe it'd be less messy if you
put make_callstmt_target after exec_stmt_call.

done


* Looks like an extra exec_prepare_plan() call snuck into
exec_assign_expr()?

fixed

I did performance tests and not the slowdown in the worst case is lower (3-5%) only for execution in non-atomic mode. The performance of atomic mode is the same.

Regards

Pavel




                        regards, tom lane
Attachment
Hi

This is a little bit of an enhanced version of the previous patch. The worst case overhead is reduced almost to zero. The local resource owner is created only when routine is executed in non-atomic mode, and when routine contains a CALL statement.

Regards

Pavel

Attachment


čt 21. 1. 2021 v 14:37 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi

This is a little bit of an enhanced version of the previous patch. The worst case overhead is reduced almost to zero. The local resource owner is created only when routine is executed in non-atomic mode, and when routine contains a CALL statement.

Sorry. Last patch wasn't tested well.

Regards

Pavel




Regards

Pavel

Attachment
Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ plpgsql-plan-cache-for-call-3.patch ]

Pushed with some additional cleanup.

It strikes me that we ought to get rid of SPI_execute_with_receiver
(which has been added since v13) in favor of a "SPI_execute_extended"
that shares the same options struct as SPI_execute_plan_extended.
But I left that for tomorrow.

            regards, tom lane





út 26. 1. 2021 v 4:33 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> [ plpgsql-plan-cache-for-call-3.patch ]

Pushed with some additional cleanup.

Thank you

Pavel


It strikes me that we ought to get rid of SPI_execute_with_receiver
(which has been added since v13) in favor of a "SPI_execute_extended"
that shares the same options struct as SPI_execute_plan_extended.
But I left that for tomorrow.

                        regards, tom lane