Re: Replacing Cursors with Temporary Tables - Mailing list pgsql-performance

From Eliot Gable
Subject Re: Replacing Cursors with Temporary Tables
Date
Msg-id n2qbf6923ed1004231342l61f766fax8fb2a482aa3b0faf@mail.gmail.com
Whole thread Raw
In response to Re: Replacing Cursors with Temporary Tables  (Eliot Gable <egable+pgsql-performance@gmail.com>)
Responses Re: Replacing Cursors with Temporary Tables  (Merlin Moncure <mmoncure@gmail.com>)
Re: Replacing Cursors with Temporary Tables  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
To answer the question of whether calling a stored procedure adds any significant overhead, I built a test case and the short answer is that it seems that it does:

CREATE OR REPLACE FUNCTION Test1() RETURNS INTEGER AS
$BODY$
DECLARE
    temp INTEGER;
BEGIN
    FOR i IN 1..1000 LOOP
        SELECT 1 AS id INTO temp;
    END LOOP;
    RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION Test2A() RETURNS SETOF INTEGER AS
$BODY$
DECLARE
BEGIN
    RETURN QUERY SELECT 1 AS id;
END;
$BODY$
LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION Test2B() RETURNS INTEGER AS
$BODY$
DECLARE
    temp INTEGER;
BEGIN
    FOR i IN 1..1000 LOOP
        temp := Test2A();
    END LOOP;
    RETURN 1;
END;
$BODY$
LANGUAGE plpgsql;


EXPLAIN ANALYZE SELECT * FROM Test1();
"Function Scan on test1  (cost=0.00..0.26 rows=1 width=4) (actual time=6.568..6.569 rows=1 loops=1)"
"Total runtime: 6.585 ms"


EXPLAIN ANALYZE SELECT * FROM Test2B();
"Function Scan on test2b  (cost=0.00..0.26 rows=1 width=4) (actual time=29.006..29.007 rows=1 loops=1)"
"Total runtime: 29.020 ms"


So, when chasing milliseconds, don't call sub functions if it can realistically and easily be avoided. I only have one operation/algorithm broken out into another stored procedure because I call it in about 8 different places and it is 900+ lines long. While everything else could be broken out into different stored procedures to make it easier to analyze the whole set of code and probably make it easier to maintain, it does not make sense from a performance perspective. Each different logical group of actions that would be in its own stored procedure is only ever used once in the whole algorithm, so there is no good code re-use going on. Further, since the containing stored procedure gets called by itself hundreds or even thousands of times per second on a production system, the nested calls to individual sub-stored procedures would just add extra overhead for no real gain. And, from these tests, it would be significant overhead.



On Thu, Apr 22, 2010 at 4:57 PM, Eliot Gable <egable+pgsql-performance@gmail.com> wrote:
I appreciate all the comments.

I will perform some benchmarking before doing the rewrite to be certain of how it will impact performance. At the very least, I think can say for near-certain now that the indexes are not going to help me given the particular queries I am dealing with and limited number of records the temp tables will have combined with the limited number of times I will re-use them.


On Thu, Apr 22, 2010 at 10:42 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Thu, Apr 22, 2010 at 10:11 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> The timings are similar, but the array returning case:
> *)  runs in a single statement.  If this is executed from the client
> that means less round trips
> *) can be passed around as a variable between functions.  temp table
> requires re-query
> *) make some things easier/cheap such as counting the array -- you get
> to call the basically free array_upper()
> *) makes some things harder.  specifically dealing with arrays on the
> client is a pain UNLESS you expand the array w/unnest() or use
> libpqtypes
> *) can nest. you can trivially nest complicated sets w/arrays
> *) does not require explicit transaction mgmt


I neglected to mention perhaps the most important point about the array method:
*) does not rely on any temporary resources.

If you write a lot of plpsql, you will start to appreciate the
difference in execution time between planned and unplanned functions.
The first time you run a function in a database session, it has to be
parsed and planned.  The planning time in particular for large-ish
functions that touch a lot of objects can exceed the execution time of
the function.  Depending on _any_ temporary resources causes plan mgmt
issues because the database detects that a table in the old plan is
gone ('on commit drop') and has to re-plan.   If your functions are
complex/long and you are counting milliseconds, then that alone should
be enough to dump any approach that depends on temp tables.

merlin



--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero



--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero

pgsql-performance by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: Optimization idea
Next
From: Merlin Moncure
Date:
Subject: Re: Replacing Cursors with Temporary Tables