Thread: Aggregates with context - a question

Aggregates with context - a question

From
Philip Warner
Date:
Dear All,

I have written a STDEV aggregate function which seems to work; the only problem is that I am not sure that the method I
amusing was intended by the designers of aggregates.
 

I am using the first parameter of sfunc1 to store a pointer to a context record, and sfunc1 creates and returns that
pointerwhen it is first called (the DB definition specified initcond1 = '0'). In this way I can accumulate all the
detailsI need.
 

The problems is that the way I read the code, there is an expectation that the parameters of sfuncX and finalfunc match
thebasetype. This is clearly not the case, and I have declared them as int4, which will also presumable break on 64 bit
implementations...

Anyway, the code is below. Any suggestions as to how I should handle complex context within aggregates would be
appreciated.

Thanks,

Philip Warner.


-------------- STDEv Func:
typedef struct {   float4      ssq;   float4      s;   float4      n;
} ctx_type;

int4
sdev_ag1(int4 ctxPtr, float4 *f2)
{       ctx_type        *ctx;       int4            *result;
       if ((ctxPtr) == 0) {               ctx = (ctx_type*)palloc(sizeof(ctx_type));               ctx->ssq = 0;
      ctx->s = 0;               ctx->n = 0;       } else {               ctx = (ctx_type*)(ctxPtr);       };
 
       ctx->ssq += (*f2) * (*f2);       ctx->s += (*f2);       ctx->n++;
       return (int4)ctx;
}

float4*
sdev_fin(int4 ctxPtr, void* p)
{       ctx_type        *ctx;       float4          *result;       float4          avg;
       result = (float4*)palloc(sizeof(float4));
       if ((ctxPtr) == 0) {               (*result) = 0;       } else {               ctx = (ctx_type*)(ctxPtr);
      avg = ctx->s / ctx->n;               (*result) = (ctx->ssq - 2*avg*ctx->s)/ctx->n + avg*avg;
(*result)= sqrt((*result));       };
 
       pfree(ctx);
       return result;
}


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: [HACKERS] Aggregates with context - a question

From
wieck@debis.com (Jan Wieck)
Date:
Philip Warner wrote:

>
> Dear All,
>
> I have written a STDEV aggregate function which seems to work; the only problem is that I am not sure that the method
Iam using was intended by the designers of aggregates. 
>
> I am using the first parameter of sfunc1 to store a pointer to a context record, and sfunc1 creates and returns that
pointerwhen it is first called (the DB definition specified initcond1 = '0'). In this way I can accumulate all the
detailsI need. 
>
> The problems is that the way I read the code, there is an expectation that the parameters of sfuncX and finalfunc
matchthe basetype. This is clearly not the case, and I have declared them as int4, which will also presumable break on
64bit implementations... 
>
> Anyway, the code is below. Any suggestions as to how I should handle complex context within aggregates would be
appreciated.


    Here's  another  implementation  of  standard  deviation  and
    variance I've done some month's ago. It's based on  it's  own
    datatype   holding   2  float64  values  as  aggregate  state
    variable. So it doesn't need  to  pass  pointers  of  context
    records around.

    I  wanted  to  wait for some other statistical functions that
    could be added to the package  before  putting  it  into  the
    contrib.  Maybe  you have some suggestions what would be nice
    to add it to? I'll add it anyway to the contrib for v6.6.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #


begin 644 statmath.tar.gz
M'XL(`/U;7C<"`^U:>W/;R`WWO]Q/@3@//6)1HF3)&9^=.<567'?D1V5[>IW+
MC88B*9D-12I\R,DUU\]>8)=/DY;EJ^-F4F(2B5PN@`7P`Q9+V?-5?Z[ZU\V-
M;T>PW=KI=F$#D#HM_JVTQ7=(+8!>;Z?7;G?Q`Y^V=EJ=#>AN/`$%GJ^Z`!N+
MF??)6C%/GZ@;/R!Y4?Q/U(_&U+2,;Z!#:;5ZV]MWQK_=ZFY3_)7M7K?;57;H
MZ7:GMP&M,O[?G)XW'HO8<_8<(A1!>(\4#TT=%_QK`R+(P=S1`\N0:>+CK8*=
M'QT>CV`?9+DIR^QB="!N7U3Y@UK3<S7&3%NS`MW`43&A%N-?GEG.1+48.SX]
M&%X=#L9GYY>P+S6.06["!R;15<*5'XDDYQXL'-=OXBK.1I>G_9-!C;&#]\/^
MT06\IL6EM-7P5CP:7PQQVLG9(3'@(B+/\;&K(0V]J(:/:R^JA\.+J_?OCW\1
M/-QL?#X\?L?U"V][C%W\;7C[&4*?,=6R=B5)3`,^(JYQD&M!A2@9'T0#7*W,
M9YHVKHWXXYG`KT@Z^8"KK*%+M`4T%E`PJXD//=\U%ZFA1&V&,=:;2&XR%K/M
MXNSY1]UT.<//]"2<E7ORDJ3L`O^239N68.BPZ7T5>L?G_<N_D+:O!6OZN@E[
M\&(/WG))LG#]X&(7,H%XF;HC11HCYQT<)$&N0<.[5EW4VW!0%(ID3#<6AJT#
M?NW&\QLG)[=Q4I<U>"OF,J99AFK3='<.C2G4_WW;R^EP38U/4!6<6R^J-Z:E
M:ZK+%>)(K1;G1R@<_YM3ME'2C[O_1Q>R[FA/N__#CB+V_VZKLZUT:?_OM'N=
M<O]_"KK`L)N>;VJJ!>ILYAHSU<>M.K`UWW1LCV_:YX[GXQ,LHC+N$DO#56=&
M=6HYJO^F)DGA`"Q5*S!`<^:+P,=B-OG"L!9)$L;7#B_&XS%=C2GF"GW0'BE]
MQLT9-VZZ?POPF0D>Y**/YAAGFR&_N:\PQI:J:ZJVEJP@&BC4'2N/M$,[41^?
M/\;0IF<>9-92_0S0@,\U2-;44*)%Q:L*E\4\7]>-9>(7Q)6MBZ*Z-%7R9L$"
MQ^-P65(3&V$0B]CGE@-XL=X/N+\Q7!?[JVK#WTU#^PA[_[RA[Y]U8V)Z,DI^
MRQXG_[5'Q]CJ_%?:7:45Y7]/:2LTM-,K^_\GH68=4LTSU.$D[/^B5A/J^(]W
MGKDR(:I#F/];$.<AXAXB]!-SG`!R*"U&,2G,J$></X]ZCSW,*-.1K]]FARQS
MDAWCJ,V,&*YK<\9D;',ABI@G7V^F1@/?M+SF`EM71Z,GC.4<<FAHENKR]7MB
M_;%SQG1A8.$;FW:U)AW;F-UQ[>3.X1/`=U7;,_D@=]+$,HK%.(&/<LX"_T\*
MTAS+,C22(5W<9HC%^0Z$\\#TC;F7D1!&<SSE%KTW;0QX9B%1N8]1D&&/,'`W
M?X*20@&BBM[-GA33#/]M$/$JW-LNCI2&[3;4\;!1^XE%UT61B(7@`$[-"XW<
MG9ZX!=$=.1?9<EQI%V<Y3=O?!KN()^/7=9E2OKR#I0#NB7'7F*`X'1IAUKYS
M`EOW@(^;]HR'0^QV8N_W4&[4*W@%0>$U1(.E8^HLIZ6J.P&BF235V+^89$ZA
M.E4G7I4&<#,^?#<<G_1_H9.@83FSZF`T.AMMX5$M%+0+#KH5EW.SB79Q=EK[
MLWUHR2UX]0H287M"V/'I"F%H:2+MCY5^2B$K=E5!)8C>?-Q*8NJ9=-57P?^R
M,*@^2I<XT>3\=!Q&/^/9$/P;)_1UQ0//6%!!PC[BQO2O0:5T1F&5W8I<X/8P
M].R>5""GAU/Q"&GX.`']**(B2?Y\@7<T6Y+JVD+9`OQL(X"XIY]Q`9PM<&V(
MT%:#TZOAD"8UZTS"E?6IS)(+O(6J)2^$$"Z!Y=.,)@M58P^42!'EN>J9OQO.
M-!HFI^$)&-JUE((CP^<2IZ;K^6`'\XGAAG+YGH!B6V@'6H-7N&C?T:OXM06O
MT*8(-V0?`0?]"5^_0LBX#X-1__1HL`(T)E:#&:Y=1`^MPV&HO/0JFUL@JHV4
MQSVNA1X(LW]M_88+X\[.V>09FH,;:\:HC"&O7_/`H"GME"EM;LJ'UM/;HN1M
M&0F`^!SBON%2>9]@D7>_A"H2$-"\"(9K)B#6[#@#B_;0=5/P=@:))&%K;!*4
M1")))L'T5Z75WOXME41+C,XRE32<(\Z:!;I5#Q;5S:H=6%:-E['8<>=8"00,
MDE)+=\;GT(MAA*CU$O(X6Y)=Q*"&Z:>+_`M]O50(0@)\J'+9CFX5NO46I'E:
M17L0'R_E^FR7/A`%5$0/CX_0)"5]TPX1D#8)F>\MHU%+$@5PK?Z%ART*;;1'
M$G>Z!2WJ&;RXU-[96,&U8^%VU\[N<3(@$_&*$F-Z`E7!')RIP#6U5%OIC,U.
M(%Z:`Y\"57?YCKAPL>?6?&^MZKU&S[%&*4]!D.K",\'W'53P`W%$YE)MXR;T
M>K;<A87R-=1IV0^LJBD92BP#%:\K:W55N^V(]:M8JBN,<T#TO^*M@9I-@LCS
MN79\+1`5MJ#!/.X,'X"@X'^#FS1F^M12@FXN38]<,_D"OQNN`[@^W.QX/D8%
MP]!#7;1VFS;"%JX>C97JL7;L%W]*+(J-_Z,8HZ$G,P"M4[8W(6QI:_8J5"5Z
MOP&FTJ>&AX"J^(QV'ZCN/Z.LAI6ZG*4WRU#:7=OE]P8UR_!H+U!MW#%686X/
MZ]V?@)QJ:8&EAJ"+=SHA&!V7*HM9X,6A1%NCHM>(Y]81O[,:L<0\^%216]SF
M/&@C:5GDID/UF/!]Q)/5>N>BU<?EAZ!7R"JQF\=NOBM;`\7"G?3XD^M7'P3D
M.Y`L)&9QG`3MX2C^<7[_$S^%/^'O?YV.HL2__[4['?K];Z>CE.__GX(:#4A'
MGF%QQ*%C\><<<6G-O_JGB>'D`]=04QU[7'-7%>?PW34*.!@-^I<#>']U>G!Y
M?'9:_,;66:B?`@-KSVAP>34ZO<C/8E+_`BJW_FRCPJ1A__3HJG\T@,I!!3/[
M?FUTIL^I$P,/TW'YC_-!7CY4L9B&KSXLPY[YUUAXE-X6C=+.ME_H`'SLB!<:
M^\5+Q@FJ9<[LN6'3'%$#&=6^7(SB>(I(A94>0Q70W[Z$[Q6R@;[3:]%A-+>F
M\$3ZYA%"EC-@FOE5P(MAYE^[QMHKSYV`WHA]/K5@,?S?`BO3%1?XZ9LH3;4R
MCZPR%XW4KX%X._<,:YIR?/_H:#0XHJOHH(KXGZB>P0O$?JA[BW9H'%!R(_0R
MBI9+]Q3Q9,;"BL:B.::MX1!'!PVG4R45;9YIID]O9DA8I55)C;3%"$^<G`5Q
M*WV?"7F7K[#F=BX]P*HTM')F[:YO6-A=?2]F)>!]@%'E'Z>55%)))9544DDE
IE51222655%)))9544DDEE51222655%)))9544DG_K_0?^Q8S<`!0````
`
end

Re: [HACKERS] Aggregates with context - a question

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> The problems is that the way I read the code, there is an expectation
> that the parameters of sfuncX and finalfunc match the basetype.

No, not at all.  transfn1 has the signaturetranstype1 = transfn1 (transtype1, basetype)
ie, you take the state object and the next input value, and deliver a
new state object.  transfn2 has the signaturetranstype2 = transfn2 (transtype2)
ie, it operates on the second state object and delivers a new value.
And finalfn has the signaturefinaltype = finalfn (transtype1, transtype2)
ie, take the two state objects and deliver the desired result.
(You can also use a finalfn that takes only one state object,
if you are only using one transfn.)

If you omit finalfn then you can only supply one transfn, and the
corresponding transtype must equal finaltype because the default
behavior is just to copy the state value computed by the transfn
that's present.  But as long as you have a finalfn, all four datatypes
can be different.

This undoubtedly seems overly baroque, and it is overkill when writing
a specialized transfn from scratch; there's no need for transfn2, just
a transfn1 and a finalfn to produce the end result.  The reason it was
done this way (I assume; I wasn't there) is that you can produce many
useful aggregates by combining *existing* operators without having to
write any code at all.  For example, avg() for float8 is built from
transfn1 = float8pl, transfn2 = float8inc, and finalfn = float8div.

I am not sure whether this is adequately documented; probably not.
Will revisit the docs for 6.6 (I suppose Thomas will have my head
if I change 'em now ;-)).

> This is clearly not the case, and I have declared them as int4, which
> will also presumable break on 64 bit implementations...

The "clean" way to do it is to make a declared datatype that corresponds
to the state storage you need, but that's overkill if such a datatype
hasn't got any other use.  I think the way you have done it is a
reasonable cheat, though I agree that using int4 is risky for
portability.  There has been some talk of inventing a type OID
representing "C string", and that (when available) might be a better way
of declaring transtype1 when it's really a private struct of some sort.

One thing you have to be very careful about is memory allocation and
lifetime.  The way you are doing it, a palloc in the first transfn1
iteration and a pfree in finalfn, will be fine.  However this may change
in 6.6, since we are going to have to do something to cure memory leaks
in aggregation.  (Currently, if the transfns are ones that palloc their
result value, as all float8 ops do for example, the storage is not
reclaimed until end of transaction.  That's no good if there are lots of
tuples...)

> Anyway, the code is below.

Looks OK except you are potentially pfreeing an uninit pointer in the
finalfn...
        regards, tom lane


Re: [HACKERS] Aggregates with context - a question

From
Philip Warner
Date:
At 10:20 9/06/99 -0400, Tom Lane wrote:

>> This is clearly not the case, and I have declared them as int4, which
>> will also presumable break on 64 bit implementations...
>
>The "clean" way to do it is to make a declared datatype that corresponds
>to the state storage you need, but that's overkill if such a datatype
>hasn't got any other use.  I think the way you have done it is a
>reasonable cheat, though I agree that using int4 is risky for
>portability.  There has been some talk of inventing a type OID
>representing "C string", and that (when available) might be a better way
>of declaring transtype1 when it's really a private struct of some sort.

Sounds like a wonderful idea; does this mean that users can be prevented from declaring a column of type 'C String'? Or
doyou then need to build all the support functions? I suppose the alternative would be to use a 'varbinary' (or
varchar?),which has the first word being the structure length. That would at least be standard.
 

>One thing you have to be very careful about is memory allocation and
>lifetime.  The way you are doing it, a palloc in the first transfn1
>iteration and a pfree in finalfn, will be fine.  However this may change
>in 6.6, since we are going to have to do something to cure memory leaks
>in aggregation.  (Currently, if the transfns are ones that palloc their
>result value, as all float8 ops do for example, the storage is not
>reclaimed until end of transaction.  That's no good if there are lots of
>tuples...)
>
>> Anyway, the code is below.
>
>Looks OK except you are potentially pfreeing an uninit pointer in the
>finalfn...
>

Oops - pretty clever, considering I had already checked if it was zero...

Thanks for the information,

Philip

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: [HACKERS] Aggregates with context - a question

From
Philip Warner
Date:
I have attached the latest attempt at my stats functions. They are defined for int4, float4 and float8, and now use a
'text'type to store context information. This gets rid of the need to pass pointers as int4, but brings me to the next
question:

is it OK to store 'binary' data (possibly containing \0) in a text field, so long as that field is never displayed?

If it is, then I presume text fields are the best approach to preserving context in complex aggregate functions. Is
thisreasonable? 

Any further (polite) suggestions would be welcome.

If all is OK, my final question is: would people be happy to incorporate such code into PG for general use once all
relevanttypes are supported? 




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/
Attachment

Re: [HACKERS] Aggregates with context - a question

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 10:20 9/06/99 -0400, Tom Lane wrote:
>> ... There has been some talk of inventing a type OID
>> representing "C string", and that (when available) might be a better way
>> of declaring transtype1 when it's really a private struct of some sort.

> Sounds like a wonderful idea; does this mean that users can be
> prevented from declaring a column of type 'C String'? Or do you then
> need to build all the support functions?

The original proposal was to have a type OID that would represent the
textual input or output of datatype input/output functions, in order to
solve the problems we have now with not being able to typecheck explicit
uses of these functions.  There would be no reason to make it a "real"
type that could be declared as a column type, AFAICS.  You'd have to be
able to refer to it by name in order to declare user-supplied datatype
I/O functions, however.  Might take a bit of a kluge to make the type
acceptable for one purpose and not the other...

> I suppose the alternative
> would be to use a 'varbinary' (or varchar?), which has the first word
> being the structure length. That would at least be standard.

That's actually probably a better idea; I'd suggest the existing "bytea"
type could be used to represent the workspace datatype for aggregates
that are really using a private struct.  Not sure how you'd get it
initialized at aggregate startup, but that's probably doable.
        regards, tom lane