Re: plpgsql functions organisation - Mailing list pgsql-general

From Adrian Klaver
Subject Re: plpgsql functions organisation
Date
Msg-id 55463C20.3080408@aklaver.com
Whole thread Raw
In response to Re: plpgsql functions organisation  (Melvin Davidson <melvin6925@gmail.com>)
List pgsql-general
On 05/03/2015 07:14 AM, Melvin Davidson wrote:
> The point was to show that yes, function calls take time, and using sub
> functions take even more time. I am not about to write an additional
> more detailed example just to show the same results. If you are in
> doubt, I respectfully suggest you do your own testing.

Can't resist a challenge. I took an existing function that calculates an
aggregated attendance count for a student or all enrolled students over
a date period and modified it to call sub functions. There are two sub
functions, one that finds the students enrolled over a period(which by
the way calls another function) and dates range they where enrolled. The
other calculates the aggregate values for each student. The original
function is student_attendance, the modified student_attendance_sub. The
results are below, where the first argument is the student_id(where 0
equals all students). The all students version returns 600 rows, the
single student 16 rows.


hplc=> select * from student_attendance(0, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.204865s
hplc=> select * from student_attendance_sub(0, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.014101s
hplc=> select * from student_attendance(0, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.041182s
hplc=> select * from student_attendance_sub(0, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.011385s
hplc=> select * from student_attendance(0, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.040762s
hplc=> select * from student_attendance_sub(0, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.016506s

hplc=> select * from student_attendance(1, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.00291s
hplc=> select * from student_attendance_sub(1, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.004125s
hplc=> select * from student_attendance(1, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.001907s
hplc=> select * from student_attendance_sub(1, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.003476s
hplc=> select * from student_attendance(1, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.00597s
hplc=> select * from student_attendance_sub(1, '2005-01-17', '2008-01-31');
NOTICE:  Time 00:00:00.003986s

Definite difference in the all students run, probably because one of the
called functions is used in a LOOP and caching applies.


>
> On Sun, May 3, 2015 at 5:26 AM, Alban Hertroys <haramrae@gmail.com
> <mailto:haramrae@gmail.com>> wrote:
>
>
>     > On 03 May 2015, at 2:56, Melvin Davidson <melvin6925@gmail.com <mailto:melvin6925@gmail.com>> wrote:
>     >
>     > OK, Here is a simple example that shows the difference between using a self contained function  and
>     > one that calls sub functions.
>     >
>     > After loading all the functions below, repeat each of the EXPLAIN statements a few times and note that
>     > callsubs takes almost TWICE as long to execute as nosub.
>     >
>     > CREATE OR REPLACE FUNCTION nosub(text)
>     >   RETURNS void AS
>     > $BODY$
>     ...
>     >     IF LENGTH(p_in_str) <= 6
>     >         THEN RAISE NOTICE 'Hi %', p_in_str;
>     >     ELSE
>     >         RAISE NOTICE 'Hello %', p_in_str;
>     >     END IF;
>     >
>     >     RETURN;
>     > END;
>     > $BODY$
>     …
>
>
>     >  CREATE OR REPLACE FUNCTION called1(text)
>     >   RETURNS void AS
>     > $BODY$
>     ...
>     >     RAISE NOTICE 'Hi %', p_in_str1;
>     >
>     >     RETURN;
>     > END;
>     > $BODY$
>     …
>
>     >  CREATE OR REPLACE FUNCTION called2(text)
>     >   RETURNS void AS
>     > $BODY$
>     ...
>     >     RAISE NOTICE 'Hello %', p_in_str2;
>     >
>     >     RETURN;
>     > END;
>     ...
>
>
>     That's a rather uninteresting experiment, as all it does is call a
>     function and raise a notice. Relative to what the functions do, the
>     function call itself takes a significant amount of time. No surprise
>     there, you'll see something similar in any language, even C. All
>     you're showing is that calling a function takes some amount of time > 0.
>
>     In C, a function call needs to look up an address to jump to, in
>     plpgsql the database needs to look up the function body in a table.
>     If the function is small and atomic it often gets called from
>     multiple other functions and is probably cached anyway. The main
>     difference between C and plpgsql here is that the latter is an
>     interpreted language, so it does need to read in the entire function
>     body after a call - which I'd expect to be quite a bit faster with a
>     smaller (atomic) function body, especially when it hasn't been
>     cached yet.
>
>     So far I haven't been convinced.
>
>     An actual use-case where the functions actually do something would
>     be far more interesting. I doubt anybody writes functions just to
>     raise a notice. I expect that in reality most plpgsql functions
>     perform database queries and do something with the result. In such
>     cases, function call overhead could be significant if the call is
>     done for each record in a result set, for example. And even then
>     it's worth considering whether that matters to your situation enough
>     that it outweighs the usual benefits of code separation.
>
>
>      > On Sat, May 2, 2015 at 7:37 PM, Adrian Klaver
>     <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>      > On 05/02/2015 03:28 PM, Bill Moran wrote:
>      > On Sat, 02 May 2015 15:06:24 -0700
>      > Adrian Klaver <adrian.klaver@aklaver.com
>     <mailto:adrian.klaver@aklaver.com>> wrote:
>      >
>      > On 05/02/2015 02:07 PM, Jeff Janes wrote:
>      > On Sat, May 2, 2015 at 1:05 PM, Adrian Klaver
>     <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>      > <mailto:adrian.klaver@aklaver.com
>     <mailto:adrian.klaver@aklaver.com>>> wrote:
>      >
>      >      On 05/02/2015 10:12 AM, Melvin Davidson wrote:
>      >
>      >          AFAIK, you cannot "package" functions in  PostgreSQL,
>     but it is
>      >          possible to
>      >          call a function from within a function.
>      >
>      >          That being said, I would seriously look at how and why
>     you are
>      >          writing
>      >          your functions
>      >          as functions that call other functions are not very
>     efficient.
>      >
>      >
>      >      I am not following. That is what packaging is about,
>     separating out
>      >      'units of work' so they can be combined as needed. Part of
>     that is
>      >      using existing functions in new functions/classes. In fact
>     in the
>      >      Postgres source I see this in many places. Now it is entirely
>      >      possible I missed a memo, so I am open to a more detailed
>      >      explanation of the inefficiencies involved.
>      >
>      >
>      > The Postgres source is written in C, not in plpgsql.  C has a good
>      > optimizing compiler and plpgsql doesn't.
>      >
>      > Does this actually matter?  I am a biologist that backed into
>     computing,
>      > so I realize I am weak on the fundamentals. Still the scientist in me
>      > wants data backing assertions. As I understand it plpgsql works
>     close to
>      > the server and is optimized to do so.  I know writing in C would be a
>      > better solution. Still is calling plpgsql functions inside plpgsql
>      > really a bad thing when just considering plpgsql?
>      >
>      > The answer to that is the same answer to so many other things: it
>     depends.
>      >
>      > plpgsql functions are slower than C. They also lack a lot of language
>      > features that C has. That being said, if they're meeting your
>     needs, then
>      > don't worry about it. plpgsql is around because for most people,
>     it works
>      > well enough. There are certainly cases when you want to create
>     very complex
>      > logic in the database and plpgsql is liable to make that
>     difficult. But
>      > there are a lot of cases where having to manage pointers and a build
>      > environment and all the things that go with C aren't justified,
>     because
>      > plpgsql has none of that complexity. There are advantages both ways.
>      >
>      > The beauty of PostgreSQL is that you have both available and you
>      > can choose whichever is best for your situation.
>      >
>      > Agreed, though in my case I drop into plpythonu when I want more
>     complex solutions.
>      >
>      >
>      >
>      >
>      > --
>      > Adrian Klaver
>      > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>      >
>      >
>      >
>      > --
>      > Melvin Davidson
>      > I reserve the right to fantasize.  Whether or not you
>      > wish to share my fantasy is entirely up to you.
>
>     Alban Hertroys
>     --
>     If you can't see the forest for the trees,
>     cut the trees and you'll find there is no forest.
>
>
>
>
> --
> *Melvin Davidson*
> I reserve the right to fantasize.  Whether or not you
> wish to share my fantasy is entirely up to you.


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: schedulers
Next
From: Nanker Phelge
Date:
Subject: Errors using JDBC batchUpdate with plpgsql function