Re: Functions in C with Ornate Data Structures - Mailing list pgsql-novice
From | Tom Lane |
---|---|
Subject | Re: Functions in C with Ornate Data Structures |
Date | |
Msg-id | 11759.1011410887@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Functions in C with Ornate Data Structures ("Stephen P. Berry" <spb@meshuggeneh.net>) |
List | pgsql-novice |
"Stephen P. Berry" <spb@meshuggeneh.net> writes: >> For example, assuming that you are willing to cheat to the extent of >> assuming sizeof(pointer) = sizeof(integer), try something like this: > I'd actually thought of doing something like this, but couldn't find > an actual explicit argument type for pointers[0], and I can't make > the assumption you describe for portability reasons (my three main > test platforms are alpha, sparc64, and x86). Fair enough. I had actually thought better of that shortly after writing, so here's how I'd really do it: Still make the declaration of the state datatype be "integer" at the SQL level, and say initcond = 0. (If you don't do this, you have to fight nodeAgg.c's ideas about what to do with a pass-by-reference datatype, and it ain't worth the trouble.) But in the C code, write acquisition and return of the state value as datstruct *ptr = (datstruct *) PG_GETARG_POINTER(0); ... PG_RETURN_POINTER(ptr); This relies on the fact that what you are *really* passing and returning is not an int but a Datum, and Datum is by definition large enough for pointers. The only part of the above that's even slightly dubious is the assumption that a Datum created from an int32 zero will read as a pointer NULL --- but I am not aware of any platform where a zero bit pattern doesn't read as a pointer NULL (and lots of pieces of Postgres would break on such a platform). You could get around that too by making the initial state condition be a SQL NULL instead of a zero, but I don't see the point. Unless you need to treat NULL input values as something other than "ignores", you really want to declare the sfunc as strict, and that gets in the way of using a NULL initcond. > Is there any way to keep `intermediate' data used by user-defined > functions around indefinitely? I.e., have some sort of crunch_init() > function that creates a bunch of in-memory data structures, which > can then be used by subsequent (and independent) queries? You can if you can figure out how to find them again. However, the only obvious answer to that is to use static variables, which falls down miserably if someone tries to run two independent instances of your aggregate in one query. I'd suggest hewing closely to the external behavior of standard aggregates --- ie, each one is an independent calculation. You can use the above techniques to build an efficient implementation. If you instead build something that has an API involving state that persists across queries, I'm pretty sure you'll regret it in the long run. > It seems like the general class of thing I'm trying to accomplish > isn't that esoteric. Imagine trying to write a function to compute > the standard deviation of arbitrary precision numbers using the GMP > library or some such. Note that I'm not saying that that's what I'm > trying to do...I'm just offering it as a simple sample problem in > which one can't pass everything as an argument in an aggregate. How > does one set about doing such a thing in Postgres? I blink not an eye to say that I'd do it exactly as described above. Stick all the intermediate state into a data structure that's referenced by a single master pointer, and pass the pointer as the "state value" of the aggregate. BTW, mlw posted some contrib code on pghackers just a day or two back that does something similar to this. He did some details differently than I would've, notably this INT32-vs-POINTER business; but it's a working example. regards, tom lane
pgsql-novice by date: