patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL - Mailing list pgsql-hackers

From Pavel Stehule
Subject patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL
Date
Msg-id CAFj8pRCLPdDAETvR7Po7gC5y_ibkn_-bOzbeJb39WHms01194Q@mail.gmail.com
Whole thread Raw
Responses Re: patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: patch: reduce overhead of execution of CALL statement in no atomic mode from PL/pgSQL  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
Hi,

here is another patch related to using CALL statement inside PL/pgSQL code.

A repeated using of CALL statement is expensive. How much?

I wrote synthetic test:

CREATE TABLE foo(a int, b int, c int);

CREATE OR REPLACE PROCEDURE public.simple_proc3(a integer, b integer, c integer, cnt int, OUT r boolean)
AS $$
BEGIN
  INSERT INTO foo VALUES(a, b, c);
  IF cnt % 10000 = 0 THEN
    COMMIT;
    r := true;
  ELSE
    r := false;
  END IF;
END;
$$ LANGUAGE plpgsql;

DO $$
DECLARE a int; b int; c int; r boolean;
BEGIN
  TRUNCATE foo;
  FOR i IN 1..10000000
  LOOP
    a := (random() * 100)::int;
    b := (random() * 100)::int;
    c := (random() * 100)::int;
    CALL simple_proc3(a, b, c, i, r);
    IF r THEN
      RAISE NOTICE 'committed at % row', i;
    END IF;
  END LOOP;
END;
$$;

I try to insert 10M rows with commit after inserting 10K rows. Execution time on master is ~ 6 minutes 368251,691 ms (06:08,252)

DO $$
DECLARE a int; b int; c int; r boolean;
BEGIN
  TRUNCATE foo;
  FOR i IN 1..10000000
  LOOP
    a := (random() * 100)::int;
    b := (random() * 100)::int;
    c := (random() * 100)::int;
    INSERT INTO foo VALUES(a, b, c);
    IF i % 10000 = 0 THEN
      COMMIT;
      r := true;
    ELSE
      r := false;
    END IF;
    IF r THEN
      RAISE NOTICE 'committed at % row', i;
    END IF;
  END LOOP;
END;
$$;

When I try to remove CALL statement then same work needs less to 2 minutes 99109,511 ms (01:39,110). So this code is three times slower with calling one procedure. There are two significant parts of overhead:

a) internal implementation of CALL statement that doesn't use plan cache well, and it does lot of expensive operations over pg_proc catalogue,

b) wrapper in PL/pgSQL that repeatedly repearse expression string.

Overhead of PL/pgSQL can be reduced by using plan cache after fixing issue with resource owner. I did it, and I introduced "local_resowner" for holding references of plans for CALL statement expressions.

After patching the execution time is reduced to 4 minutes Time: 245852,846 ms (04:05,853). Still the overhead is significant, but it is 30% speedup.

The best case for this patch is about 2x times better performance

CREATE OR REPLACE PROCEDURE public.simple_proc2(a integer, b integer, c integer, cnt int, OUT r boolean)
AS $$
BEGIN
END;
$$ LANGUAGE plpgsql;

DO $$
DECLARE a int; r boolean;
BEGIN
  FOR i IN 1..10000000
  LOOP
    CALL simple_proc2((random()*100)::int, (random()*100)::int, (random()*100)::int, i, r);
  END LOOP;
END;
$$;

Time: 184667,970 ms (03:04,668), master: Time: 417463,457 ms (06:57,463)

On second hand, the worst case is about 10% (probably this overhead can be reduced by creating "local_resowner" only when it is necessary)

CREATE OR REPLACE FUNCTION simple_fx2(a int)
RETURNS int AS $$
BEGIN
  RETURN a + a;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;

DO $$
DECLARE a int;
BEGIN
  FOR i IN 1..10000000
  LOOP
    a := simple_fx2(i);
  END LOOP;
END;
$$;

Time: 5434,808 ms (00:05,435) , master: Time: 4632,762 ms (00:04,633)

Comments, notes, ideas?

Regards

Pavel



Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: hash_array_extended() needs to pass down collation
Next
From: Erik Rijkers
Date:
Subject: Re: Split copy.c