Thread: Maintaining state across function calls

Maintaining state across function calls

From

matt@byrney.com

Date:

19 November 2012, 15:41:48

I want to process all the records in a table through a C-language (well,
C++) function (i.e. one function call per row of the table) in such a way
that the function hangs onto its internal state across calls.  Something
like

SELECT my_function(a, b, c) FROM my_table ORDER BY d;

The value returned in the last row of the table would be the result I'm
looking for.  (This could be neatened up by using a custom aggregate and
putting my calculation in the sfunc but that's a minor detail).

The question is: what's the "best practice" way of letting a
C/C++-language function hang onto internal state across calls?  So far I'm
thinking something along the lines of:

Datum my_function(int a, int b, int c, int reset)
{
  static my_data *p = NULL;
  if (reset) //(re)initialise internal state
  {
    delete p;
    p = NULL;
  }
  else
  {
    if (!p)
    {
      p = new my_data;
    }
    //make use of internal state to do calculations or whatever
  }
}

The user would be responsible for calling my_function with "reset" set to
true to wipe previous internal state before using the function in a new
query; doing this also frees the memory associated with the function.
This system is of course prone to leakage if the user forgets to wipe the
internal state after use, but it will only leak sizeof(my_data) per
connection, and the OS will garbage-collect all that when the connection
dies anyway.

Alternatively, use this in a custom aggregate and make the ffunc do the
garbage collection, which should prevents leakage altogether.

Is this a reasonable thing to do?  What are the risks?  Is there a more
"best-practice" way to achieve the same result?

Many thanks,

Matt

Re: Maintaining state across function calls

From

Craig Ringer

Date:

19 November 2012, 16:29:21

On 11/19/2012 08:41 PM, matt@byrney.com wrote:
> I want to process all the records in a table through a C-language (well,
> C++) function (i.e. one function call per row of the table) in such a way
> that the function hangs onto its internal state across calls.  Something
> like
>
> SELECT my_function(a, b, c) FROM my_table ORDER BY d;
>
> The value returned in the last row of the table would be the result I'm
> looking for.  (This could be neatened up by using a custom aggregate and
> putting my calculation in the sfunc but that's a minor detail).
[snip]
> Alternatively, use this in a custom aggregate and make the ffunc do the
> garbage collection, which should prevents leakage altogether.
You don't generally need to do this cleanup yourself. Use appropriate
palloc memory contexts and it'll be done for you when the memory context
is destroyed.

I would want to implement this as an aggregate using the standard
aggregate / window function machinery. Have a look at how the existing
aggregates like string_agg are implemented in the Pg source code.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Maintaining state across function calls

From

matt@byrney.com

Date:

19 November 2012, 17:09:32

> On 11/19/2012 08:41 PM, matt@byrney.com wrote:
>> I want to process all the records in a table through a C-language (well,
>> C++) function (i.e. one function call per row of the table) in such a
>> way
>> that the function hangs onto its internal state across calls.  Something
>> like
>>
>> SELECT my_function(a, b, c) FROM my_table ORDER BY d;
>>
>> The value returned in the last row of the table would be the result I'm
>> looking for.  (This could be neatened up by using a custom aggregate and
>> putting my calculation in the sfunc but that's a minor detail).
> [snip]
>> Alternatively, use this in a custom aggregate and make the ffunc do the
>> garbage collection, which should prevents leakage altogether.
> You don't generally need to do this cleanup yourself. Use appropriate
> palloc memory contexts and it'll be done for you when the memory context
> is destroyed.
>
> I would want to implement this as an aggregate using the standard
> aggregate / window function machinery. Have a look at how the existing
> aggregates like string_agg are implemented in the Pg source code.

Thanks for your reply.  A follow-up question: to use the palloc/pfree
functions with a C++ STL container, do I simply give the container an
allocator which uses palloc and pfree instead of the default allocator,
which uses new and delete?

Matt

Re: Maintaining state across function calls

From

Tom Lane

Date:

19 November 2012, 18:28:58

matt@byrney.com writes:
> The question is: what's the "best practice" way of letting a
> C/C++-language function hang onto internal state across calls?

A static variable for that is a really horrid idea.  Instead use
fcinfo->flinfo->fn_extra to point to some workspace palloc'd in the
appropriate context.  If you grep the PG sources for fn_extra you'll
find plenty of examples.

            regards, tom lane

Re: Maintaining state across function calls

From

matt@byrney.com

Date:

19 November 2012, 21:37:12

> matt@byrney.com writes:
>> The question is: what's the "best practice" way of letting a
>> C/C++-language function hang onto internal state across calls?
>
> A static variable for that is a really horrid idea.  Instead use
> fcinfo->flinfo->fn_extra to point to some workspace palloc'd in the
> appropriate context.  If you grep the PG sources for fn_extra you'll
> find plenty of examples.
>
>             regards, tom lane
>

Thanks for this.  Out of curiosity, why is a static a bad way to do this?

Re: Maintaining state across function calls

From

Tom Lane

Date:

19 November 2012, 22:37:25

matt@byrney.com writes:
> Thanks for this.  Out of curiosity, why is a static a bad way to do this?

Well, it wouldn't allow more than one instance of the function per
query, and it wouldn't reset correctly after an error, and surely you
agree that your proposal of making the user do a separate "reset" step
is an unreliable and unpleasant-to-use kluge.

            regards, tom lane

Re: Maintaining state across function calls

From

Craig Ringer

Date:

20 November 2012, 04:30:32

On 11/19/2012 10:09 PM, matt@byrney.com wrote:
> Thanks for your reply.  A follow-up question: to use the palloc/pfree
> functions with a C++ STL container, do I simply give the container an
> allocator which uses palloc and pfree instead of the default allocator,
> which uses new and delete?
If at all possible, isolate your C++ code from the PostgreSQL aggregate
implementation. Pass the C++ code pre-allocated buffers to work with if
you can, and manage the allocations in the Pg C code. Turn your C++ code
into library that presents only `extern "C"` interfaces and opaque types
if yu can.

C++ exception handling and the PostgreSQL backend's longjmp() based
error handling will interact in exciting and interesting ways. Avoid
calling `palloc`, `pfree` etc from within C++ if you can. If you really
must, ensure that your C++ code doesn't use any RAII, stack-allocated
objects with dtors, etc.

Otherwise you'll have to translate error handling mechanisms at every
boundary between C++ and Pg code, something I'm not even certain is
possible to do reliably.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Maintaining state across function calls

From

Peter Geoghegan

Date:

20 November 2012, 04:58:04

On 20 November 2012 01:30, Craig Ringer <craig@2ndquadrant.com> wrote:
> Otherwise you'll have to translate error handling mechanisms at every
> boundary between C++ and Pg code, something I'm not even certain is
> possible to do reliably.

I think it's probably the case that PLV8 is the most mature example of
wrapping a C++ library that is liable to throw C++ exceptions within
Postgres backend code, in a sane way (that is, avoiding unwinding the
stack via longjmp() over a part of the stack where a destructor needs
to be called, which is undefined in C++).

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: Maintaining state across function calls

From

"Kevin Grittner"

Date:

21 November 2012, 05:26:36

Craig Ringer wrote:

> If at all possible, isolate your C++ code from the PostgreSQL
> aggregate implementation. Pass the C++ code pre-allocated buffers
> to work with if you can, and manage the allocations in the Pg C
> code. Turn your C++ code into library that presents only `extern
> "C"` interfaces and opaque types if yu can.

+1

You definitely want to separately compile the C code which interfaces
with PostgreSQL and calls C entry points to the C++ code. A clear and
clean boundary here is critical to reliability and maintainability.

-Kevin

Re: Maintaining state across function calls

From

Chris Angelico

Date:

21 November 2012, 07:20:38

On Tue, Nov 20, 2012 at 12:30 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> C++ exception handling and the PostgreSQL backend's longjmp() based
> error handling will interact in exciting and interesting ways.

Define "interesting"? You mean in Wash's sense of "Oh God, oh God,
we're going to receive signal 9"?

Not a huge fan of C++ exception handling myself, it seems to interact
with a few things that way.

ChrisA