Hi,
here is another patch related to using CALL statement inside PL/pgSQL code.
A repeated using of CALL statement is expensive. How much?
I wrote synthetic test:
CREATE TABLE foo(a int, b int, c int);
CREATE OR REPLACE PROCEDURE public.simple_proc3(a integer, b integer, c integer, cnt int, OUT r boolean)
AS $$
BEGIN
INSERT INTO foo VALUES(a, b, c);
IF cnt % 10000 = 0 THEN
COMMIT;
r := true;
ELSE
r := false;
END IF;
END;
$$ LANGUAGE plpgsql;
DO $$
DECLARE a int; b int; c int; r boolean;
BEGIN
TRUNCATE foo;
FOR i IN 1..10000000
LOOP
a := (random() * 100)::int;
b := (random() * 100)::int;
c := (random() * 100)::int;
CALL simple_proc3(a, b, c, i, r);
IF r THEN
RAISE NOTICE 'committed at % row', i;
END IF;
END LOOP;
END;
$$;
I try to insert 10M rows with commit after inserting 10K rows. Execution time on master is ~ 6 minutes 368251,691 ms (06:08,252)
DO $$
DECLARE a int; b int; c int; r boolean;
BEGIN
TRUNCATE foo;
FOR i IN 1..10000000
LOOP
a := (random() * 100)::int;
b := (random() * 100)::int;
c := (random() * 100)::int;
INSERT INTO foo VALUES(a, b, c);
IF i % 10000 = 0 THEN
COMMIT;
r := true;
ELSE
r := false;
END IF;
IF r THEN
RAISE NOTICE 'committed at % row', i;
END IF;
END LOOP;
END;
$$;
When I try to remove CALL statement then same work needs less to 2 minutes 99109,511 ms (01:39,110). So this code is three times slower with calling one procedure. There are two significant parts of overhead:
a) internal implementation of CALL statement that doesn't use plan cache well, and it does lot of expensive operations over pg_proc catalogue,
b) wrapper in PL/pgSQL that repeatedly repearse expression string.
Overhead of PL/pgSQL can be reduced by using plan cache after fixing issue with resource owner. I did it, and I introduced "local_resowner" for holding references of plans for CALL statement expressions.
After patching the execution time is reduced to 4 minutes Time: 245852,846 ms (04:05,853). Still the overhead is significant, but it is 30% speedup.
The best case for this patch is about 2x times better performance
CREATE OR REPLACE PROCEDURE public.simple_proc2(a integer, b integer, c integer, cnt int, OUT r boolean)
AS $$
BEGIN
END;
$$ LANGUAGE plpgsql;
DO $$
DECLARE a int; r boolean;
BEGIN
FOR i IN 1..10000000
LOOP
CALL simple_proc2((random()*100)::int, (random()*100)::int, (random()*100)::int, i, r);
END LOOP;
END;
$$;
Time: 184667,970 ms (03:04,668), master: Time: 417463,457 ms (06:57,463)
On second hand, the worst case is about 10% (probably this overhead can be reduced by creating "local_resowner" only when it is necessary)
CREATE OR REPLACE FUNCTION simple_fx2(a int)
RETURNS int AS $$
BEGIN
RETURN a + a;
END;
$$ LANGUAGE plpgsql IMMUTABLE STRICT;
DO $$
DECLARE a int;
BEGIN
FOR i IN 1..10000000
LOOP
a := simple_fx2(i);
END LOOP;
END;
$$;
Time: 5434,808 ms (00:05,435) , master: Time: 4632,762 ms (00:04,633)
Comments, notes, ideas?
Regards
Pavel