Thread: Aggregates with context - a question
Dear All, I have written a STDEV aggregate function which seems to work; the only problem is that I am not sure that the method I amusing was intended by the designers of aggregates. I am using the first parameter of sfunc1 to store a pointer to a context record, and sfunc1 creates and returns that pointerwhen it is first called (the DB definition specified initcond1 = '0'). In this way I can accumulate all the detailsI need. The problems is that the way I read the code, there is an expectation that the parameters of sfuncX and finalfunc match thebasetype. This is clearly not the case, and I have declared them as int4, which will also presumable break on 64 bit implementations... Anyway, the code is below. Any suggestions as to how I should handle complex context within aggregates would be appreciated. Thanks, Philip Warner. -------------- STDEv Func: typedef struct { float4 ssq; float4 s; float4 n; } ctx_type; int4 sdev_ag1(int4 ctxPtr, float4 *f2) { ctx_type *ctx; int4 *result; if ((ctxPtr) == 0) { ctx = (ctx_type*)palloc(sizeof(ctx_type)); ctx->ssq = 0; ctx->s = 0; ctx->n = 0; } else { ctx = (ctx_type*)(ctxPtr); }; ctx->ssq += (*f2) * (*f2); ctx->s += (*f2); ctx->n++; return (int4)ctx; } float4* sdev_fin(int4 ctxPtr, void* p) { ctx_type *ctx; float4 *result; float4 avg; result = (float4*)palloc(sizeof(float4)); if ((ctxPtr) == 0) { (*result) = 0; } else { ctx = (ctx_type*)(ctxPtr); avg = ctx->s / ctx->n; (*result) = (ctx->ssq - 2*avg*ctx->s)/ctx->n + avg*avg; (*result)= sqrt((*result)); }; pfree(ctx); return result; } ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Philip Warner wrote: > > Dear All, > > I have written a STDEV aggregate function which seems to work; the only problem is that I am not sure that the method Iam using was intended by the designers of aggregates. > > I am using the first parameter of sfunc1 to store a pointer to a context record, and sfunc1 creates and returns that pointerwhen it is first called (the DB definition specified initcond1 = '0'). In this way I can accumulate all the detailsI need. > > The problems is that the way I read the code, there is an expectation that the parameters of sfuncX and finalfunc matchthe basetype. This is clearly not the case, and I have declared them as int4, which will also presumable break on 64bit implementations... > > Anyway, the code is below. Any suggestions as to how I should handle complex context within aggregates would be appreciated. Here's another implementation of standard deviation and variance I've done some month's ago. It's based on it's own datatype holding 2 float64 values as aggregate state variable. So it doesn't need to pass pointers of context records around. I wanted to wait for some other statistical functions that could be added to the package before putting it into the contrib. Maybe you have some suggestions what would be nice to add it to? I'll add it anyway to the contrib for v6.6. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) # begin 644 statmath.tar.gz M'XL(`/U;7C<"`^U:>W/;R`WWO]Q/@3@//6)1HF3)&9^=.<567'?D1V5[>IW+ MC88B*9D-12I\R,DUU\]>8)=/DY;EJ^-F4F(2B5PN@`7P`Q9+V?-5?Z[ZU\V- M;T>PW=KI=F$#D#HM_JVTQ7=(+8!>;Z?7;G?Q`Y^V=EJ=#>AN/`$%GJ^Z`!N+ MF??)6C%/GZ@;/R!Y4?Q/U(_&U+2,;Z!#:;5ZV]MWQK_=ZFY3_)7M7K?;57;H MZ7:GMP&M,O[?G)XW'HO8<_8<(A1!>(\4#TT=%_QK`R+(P=S1`\N0:>+CK8*= M'QT>CV`?9+DIR^QB="!N7U3Y@UK3<S7&3%NS`MW`43&A%N-?GEG.1+48.SX] M&%X=#L9GYY>P+S6.06["!R;15<*5'XDDYQXL'-=OXBK.1I>G_9-!C;]\/^ MT06\IL6EM-7P5CP:7PQQVLG9(3'@(B+/\;&K(0V]J(:/:R^JA\.+J_?OCW\1 M/-QL?#X\?L?U"V][C%W\;7C[&4*?,=6R=B5)3`,^(JYQD&M!A2@9'T0#7*W, M9YHVKHWXXYG`KT@Z^8"KK*%+M`4T%E`PJXD//=\U%ZFA1&V&,=:;2&XR%K/M MXNSY1]UT.<//]"2<E7ORDJ3L`O^239N68.BPZ7T5>L?G_<N_D+:O!6OZN@E[ M\&(/WG))LG#]X&(7,H%XF;HC11HCYQT<)$&N0<.[5EW4VW!0%(ID3#<6AJT# M?NW&\QLG)[=Q4I<U>"OF,J99AFK3='<.C2G4_WW;R^EP38U/4!6<6R^J-Z:E M:ZK+%>)(K1;G1R@<_YM3ME'2C[O_1Q>R[FA/N__#CB+V_VZKLZUT:?_OM'N= M<O]_"KK`L)N>;VJJ!>ILYAHSU<>M.K`UWW1LCV_:YX[GXQ,LHC+N$DO#56=& M=6HYJO^F)DGA`"Q5*S!`<^:+P,=B-OG"L!9)$L;7#B_&XS%=C2GF"GW0'BE] MQLT9-VZZ?POPF0D>Y**/YAAGFR&_N:\PQI:J:ZJVEJP@&BC4'2N/M$,[41^? M/\;0IF<>9-92_0S0@,\U2-;44*)%Q:L*E\4\7]>-9>(7Q)6MBZ*Z-%7R9L$" MQ^-P65(3&V$0B]CGE@-XL=X/N+\Q7!?[JVK#WTU#^PA[_[RA[Y]U8V)Z,DI^ MRQXG_[5'Q]CJ_%?:7:45Y7]/:2LTM-,K^_\GH68=4LTSU.$D[/^B5A/J^(]W MGKDR(:I#F/];$.<AXAXB]!-SG`!R*"U&,2G,J$></X]ZCSW,*-.1K]]FARQS MDAWCJ,V,&*YK<\9D;',ABI@G7V^F1@/?M+SF`EM71Z,GC.4<<FAHENKR]7MB M_;%SQG1A8.$;FW:U)AW;F-UQ[>3.X1/`=U7;,_D@=]+$,HK%.(&/<LX"_T\* MTAS+,C22(5W<9HC%^0Z$\\#TC;F7D1!&<SSE%KTW;0QX9B%1N8]1D&&/,'`W M?X*20@&BBM[-GA33#/]M$/$JW-LNCI2&[3;4\;!1^XE%UT61B(7@`$[-"XW< MG9ZX!=$=.1?9<EQI%V<Y3=O?!KN()^/7=9E2OKR#I0#NB7'7F*`X'1IAUKYS M`EOW@(^;]HR'0^QV8N_W4&[4*W@%0>$U1(.E8^HLIZ6J.P&BF235V+^89$ZA M.E4G7I4&<#,^?#<<G_1_H9.@83FSZF`T.AMMX5$M%+0+#KH5EW.SB79Q=EK[ MLWUHR2UX]0H287M"V/'I"F%H:2+MCY5^2B$K=E5!)8C>?-Q*8NJ9=-57P?^R M,*@^2I<XT>3\=!Q&/^/9$/P;)_1UQ0//6%!!PC[BQO2O0:5T1F&5W8I<X/8P M].R>5""GAU/Q"&GX.`']**(B2?Y\@7<T6Y+JVD+9`OQL(X"XIY]Q`9PM<&V( MT%:#TZOAD"8UZTS"E?6IS)(+O(6J)2^$$"Z!Y=.,)@M58P^42!'EN>J9OQO. M-!HFI^$)&-JUE((CP^<2IZ;K^6`'\XGAAG+YGH!B6V@'6H-7N&C?T:OXM06O MT*8(-V0?`0?]"5^_0LBX#X-1__1HL`(T)E:#&:Y=1`^MPV&HO/0JFUL@JHV4 MQSVNA1X(LW]M_88+X\[.V>09FH,;:\:HC"&O7_/`H"GME"EM;LJ'UM/;HN1M M&0F`^!SBON%2>9]@D7>_A"H2$-"\"(9K)B#6[#@#B_;0=5/P=@:))&%K;!*4 M1")))L'T5Z75WOXME41+C,XRE32<(\Z:!;I5#Q;5S:H=6%:-E['8<>=8"00, MDE)+=\;GT(MAA*CU$O(X6Y)=Q*"&Z:>+_`M]O50(0@)\J'+9CFX5NO46I'E: M17L0'R_E^FR7/A`%5$0/CX_0)"5]TPX1D#8)F>\MHU%+$@5PK?Z%ART*;;1' M$G>Z!2WJ&;RXU-[96,&U8^%VU\[N<3(@$_&*$F-Z`E7!')RIP#6U5%OIC,U. M(%Z:`Y\"57?YCKAPL>?6?&^MZKU&S[%&*4]!D.K",\'W'53P`W%$YE)MXR;T M>K;<A87R-=1IV0^LJBD92BP#%:\K:W55N^V(]:M8JBN,<T#TO^*M@9I-@LCS MN79\+1`5MJ#!/.X,'X"@X'^#FS1F^M12@FXN38]<,_D"OQNN`[@^W.QX/D8% MP]!#7;1VFS;"%JX>C97JL7;L%W]*+(J-_Z,8HZ$G,P"M4[8W(6QI:_8J5"5Z MOP&FTJ>&AX"J^(QV'ZCN/Z.LAI6ZG*4WRU#:7=OE]P8UR_!H+U!MW#%686X/ MZ]V?@)QJ:8&EAJ"+=SHA&!V7*HM9X,6A1%NCHM>(Y]81O[,:L<0\^%216]SF M/&@C:5GDID/UF/!]Q)/5>N>BU<?EAZ!7R"JQF\=NOBM;`\7"G?3XD^M7'P3D M.Y`L)&9QG`3MX2C^<7[_$S^%/^'O?YV.HL2__[4['?K];Z>CE.__GX(:#4A' MGF%QQ*%C\><<<6G-O_JGB>'D`]=04QU[7'-7%>?PW34*.!@-^I<#>']U>G!Y M?'9:_,;66:B?`@-KSVAP>34ZO<C/8E+_`BJW_FRCPJ1A__3HJG\T@,I!!3/[ M?FUTIL^I$P,/TW'YC_-!7CY4L9B&KSXLPY[YUUAXE-X6C=+.ME_H`'SLB!<: M^\5+Q@FJ9<[LN6'3'%$#&=6^7(SB>(I(A94>0Q70W[Z$[Q6R@;[3:]%A-+>F M\$3ZYA%"EC-@FOE5P(MAYE^[QMHKSYV`WHA]/K5@,?S?`BO3%1?XZ9LH3;4R MCZPR%XW4KX%X._<,:YIR?/_H:#0XHJOHH(KXGZB>P0O$?JA[BW9H'%!R(_0R MBI9+]Q3Q9,;"BL:B.::MX1!'!PVG4R45;9YIID]O9DA8I55)C;3%"$^<G`5Q M*WV?"7F7K[#F=BX]P*HTM')F[:YO6-A=?2]F)>!]@%'E'Z>55%)))9544DDE IE51222655%)))9544DDEE51222655%)))9544DG_K_0?^Q8S<`!0```` ` end
Philip Warner <pjw@rhyme.com.au> writes: > The problems is that the way I read the code, there is an expectation > that the parameters of sfuncX and finalfunc match the basetype. No, not at all. transfn1 has the signaturetranstype1 = transfn1 (transtype1, basetype) ie, you take the state object and the next input value, and deliver a new state object. transfn2 has the signaturetranstype2 = transfn2 (transtype2) ie, it operates on the second state object and delivers a new value. And finalfn has the signaturefinaltype = finalfn (transtype1, transtype2) ie, take the two state objects and deliver the desired result. (You can also use a finalfn that takes only one state object, if you are only using one transfn.) If you omit finalfn then you can only supply one transfn, and the corresponding transtype must equal finaltype because the default behavior is just to copy the state value computed by the transfn that's present. But as long as you have a finalfn, all four datatypes can be different. This undoubtedly seems overly baroque, and it is overkill when writing a specialized transfn from scratch; there's no need for transfn2, just a transfn1 and a finalfn to produce the end result. The reason it was done this way (I assume; I wasn't there) is that you can produce many useful aggregates by combining *existing* operators without having to write any code at all. For example, avg() for float8 is built from transfn1 = float8pl, transfn2 = float8inc, and finalfn = float8div. I am not sure whether this is adequately documented; probably not. Will revisit the docs for 6.6 (I suppose Thomas will have my head if I change 'em now ;-)). > This is clearly not the case, and I have declared them as int4, which > will also presumable break on 64 bit implementations... The "clean" way to do it is to make a declared datatype that corresponds to the state storage you need, but that's overkill if such a datatype hasn't got any other use. I think the way you have done it is a reasonable cheat, though I agree that using int4 is risky for portability. There has been some talk of inventing a type OID representing "C string", and that (when available) might be a better way of declaring transtype1 when it's really a private struct of some sort. One thing you have to be very careful about is memory allocation and lifetime. The way you are doing it, a palloc in the first transfn1 iteration and a pfree in finalfn, will be fine. However this may change in 6.6, since we are going to have to do something to cure memory leaks in aggregation. (Currently, if the transfns are ones that palloc their result value, as all float8 ops do for example, the storage is not reclaimed until end of transaction. That's no good if there are lots of tuples...) > Anyway, the code is below. Looks OK except you are potentially pfreeing an uninit pointer in the finalfn... regards, tom lane
At 10:20 9/06/99 -0400, Tom Lane wrote: >> This is clearly not the case, and I have declared them as int4, which >> will also presumable break on 64 bit implementations... > >The "clean" way to do it is to make a declared datatype that corresponds >to the state storage you need, but that's overkill if such a datatype >hasn't got any other use. I think the way you have done it is a >reasonable cheat, though I agree that using int4 is risky for >portability. There has been some talk of inventing a type OID >representing "C string", and that (when available) might be a better way >of declaring transtype1 when it's really a private struct of some sort. Sounds like a wonderful idea; does this mean that users can be prevented from declaring a column of type 'C String'? Or doyou then need to build all the support functions? I suppose the alternative would be to use a 'varbinary' (or varchar?),which has the first word being the structure length. That would at least be standard. >One thing you have to be very careful about is memory allocation and >lifetime. The way you are doing it, a palloc in the first transfn1 >iteration and a pfree in finalfn, will be fine. However this may change >in 6.6, since we are going to have to do something to cure memory leaks >in aggregation. (Currently, if the transfns are ones that palloc their >result value, as all float8 ops do for example, the storage is not >reclaimed until end of transaction. That's no good if there are lots of >tuples...) > >> Anyway, the code is below. > >Looks OK except you are potentially pfreeing an uninit pointer in the >finalfn... > Oops - pretty clever, considering I had already checked if it was zero... Thanks for the information, Philip ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
I have attached the latest attempt at my stats functions. They are defined for int4, float4 and float8, and now use a 'text'type to store context information. This gets rid of the need to pass pointers as int4, but brings me to the next question: is it OK to store 'binary' data (possibly containing \0) in a text field, so long as that field is never displayed? If it is, then I presume text fields are the best approach to preserving context in complex aggregate functions. Is thisreasonable? Any further (polite) suggestions would be welcome. If all is OK, my final question is: would people be happy to incorporate such code into PG for general use once all relevanttypes are supported? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: +61-03-5367 7422 | _________ \ Fax: +61-03-5367 7430 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
Attachment
Philip Warner <pjw@rhyme.com.au> writes: > At 10:20 9/06/99 -0400, Tom Lane wrote: >> ... There has been some talk of inventing a type OID >> representing "C string", and that (when available) might be a better way >> of declaring transtype1 when it's really a private struct of some sort. > Sounds like a wonderful idea; does this mean that users can be > prevented from declaring a column of type 'C String'? Or do you then > need to build all the support functions? The original proposal was to have a type OID that would represent the textual input or output of datatype input/output functions, in order to solve the problems we have now with not being able to typecheck explicit uses of these functions. There would be no reason to make it a "real" type that could be declared as a column type, AFAICS. You'd have to be able to refer to it by name in order to declare user-supplied datatype I/O functions, however. Might take a bit of a kluge to make the type acceptable for one purpose and not the other... > I suppose the alternative > would be to use a 'varbinary' (or varchar?), which has the first word > being the structure length. That would at least be standard. That's actually probably a better idea; I'd suggest the existing "bytea" type could be used to represent the workspace datatype for aggregates that are really using a private struct. Not sure how you'd get it initialized at aggregate startup, but that's probably doable. regards, tom lane