Re: Functions in C with Ornate Data Structures - Mailing list pgsql-novice

From Tom Lane
Subject Re: Functions in C with Ornate Data Structures
Date
Msg-id 11759.1011410887@sss.pgh.pa.us
Whole thread Raw
In response to Re: Functions in C with Ornate Data Structures  ("Stephen P. Berry" <spb@meshuggeneh.net>)
List pgsql-novice
"Stephen P. Berry" <spb@meshuggeneh.net> writes:
>> For example, assuming that you are willing to cheat to the extent of
>> assuming sizeof(pointer) = sizeof(integer), try something like this:

> I'd actually thought of doing something like this, but couldn't find
> an actual explicit argument type for pointers[0], and I can't make
> the assumption you describe for portability reasons (my three main
> test platforms are alpha, sparc64, and x86).

Fair enough.  I had actually thought better of that shortly after writing,
so here's how I'd really do it:

Still make the declaration of the state datatype be "integer" at the SQL
level, and say initcond = 0.  (If you don't do this, you have to fight
nodeAgg.c's ideas about what to do with a pass-by-reference datatype,
and it ain't worth the trouble.)  But in the C code, write acquisition
and return of the state value as

    datstruct *ptr = (datstruct *) PG_GETARG_POINTER(0);

    ...

    PG_RETURN_POINTER(ptr);

This relies on the fact that what you are *really* passing and returning
is not an int but a Datum, and Datum is by definition large enough for
pointers.  The only part of the above that's even slightly dubious is
the assumption that a Datum created from an int32 zero will read as a
pointer NULL --- but I am not aware of any platform where a zero bit
pattern doesn't read as a pointer NULL (and lots of pieces of Postgres
would break on such a platform).  You could get around that too by
making the initial state condition be a SQL NULL instead of a zero, but
I don't see the point.  Unless you need to treat NULL input values as
something other than "ignores", you really want to declare the sfunc as
strict, and that gets in the way of using a NULL initcond.

> Is there any way to keep `intermediate' data used by user-defined
> functions around indefinitely?  I.e., have some sort of crunch_init()
> function that creates a bunch of in-memory data structures, which
> can then be used by subsequent (and independent) queries?

You can if you can figure out how to find them again.  However, the
only obvious answer to that is to use static variables, which falls
down miserably if someone tries to run two independent instances of
your aggregate in one query.  I'd suggest hewing closely to the external
behavior of standard aggregates --- ie, each one is an independent
calculation.  You can use the above techniques to build an efficient
implementation.  If you instead build something that has an API
involving state that persists across queries, I'm pretty sure you'll
regret it in the long run.

> It seems like the general class of thing I'm trying to accomplish
> isn't that esoteric.  Imagine trying to write a function to compute
> the standard deviation of arbitrary precision numbers using the GMP
> library or some such.  Note that I'm not saying that that's what I'm
> trying to do...I'm just offering it as a simple sample problem in
> which one can't pass everything as an argument in an aggregate.  How
> does one set about doing such a thing in Postgres?

I blink not an eye to say that I'd do it exactly as described above.
Stick all the intermediate state into a data structure that's referenced
by a single master pointer, and pass the pointer as the "state value"
of the aggregate.

BTW, mlw posted some contrib code on pghackers just a day or two back
that does something similar to this.  He did some details differently
than I would've, notably this INT32-vs-POINTER business; but it's a
working example.

            regards, tom lane

pgsql-novice by date:

Previous
From: "Stephen P. Berry"
Date:
Subject: Re: Functions in C with Ornate Data Structures
Next
From: Ewald Geschwinde
Date:
Subject: one pl/pgsql question