Thread: OLAP, Aggregates, and order of operations
I need to do some OLAP stuff, and I asked previously if there were a way to pass multiple parameters to an aggrigate function. i.e.: select mycube(value1, value2, value3) from table group by value1; I looked through the code and it is non-trivial to do, one would have to alter the grammar to include a number of parameters, I guess something like this: create aggregate (sfunc = myfunct, sfuncnargs=3, stype = int4, basetype1 = int4, basetype2 = int4, ....); Then change the catalog, and the execution, arrg! (God I wish I could spend the time I want on PostgreSQL! ) Anyway, short of that.... If I do this: select mycube(value1) as d1, dimention(value2) as d2, dimention(value3) as d3 group by value1; Can I safely assume the following: (1) mycube() will be called first (2) Assuming dimention() has no final func, that final func of mycube() will be called last.
mlw <markw@mohawksoft.com> writes: > I need to do some OLAP stuff, and I asked previously if there were a way > to pass multiple parameters to an aggrigate function. i.e.: > I looked through the code and it is non-trivial to do, Offhand I don't know of any fundamental reason why it couldn't be done, but you're right that it'd take a fair amount of work. > If I do this: > select mycube(value1) as d1, dimention(value2) as d2, dimention(value3) > as d3 group by value1; > Can I safely assume the following: > (1) mycube() will be called first > (2) Assuming dimention() has no final func, that final func of mycube() > will be called last. That might be true in the present code, but it strikes me as an awfully risky set of assumptions. Also, it sounds like what you have in mind is to have some hidden state that all the aggregate functions will access; how then will you work if there are more than one set of these aggregates being used in a query? If the needed parameters are all the same datatype, maybe you could put them into an array and pass the array as a single argument to the aggregate. regards, tom lane
mlw <markw@mohawksoft.com> writes: >> If the needed parameters are all the same datatype, maybe you could put >> them into an array and pass the array as a single argument to the >> aggregate. > How would you do this without having to make multiple SQL calls? I was thinking something like select my_aggregate(my_array_constructor(foo, bar, baz)) from ... where my_array_constructor is a quick hack C routine to build a 3-element array from 3 input arguments (s/3/whatever you need/). Someday we ought to have SQL syntax to build an array value from a list of scalars, but in the meantime an auxiliary function is the only way to do it. The overhead of constructing and then interpreting the temporary array value is slightly annoying, but I don't think it'll be horribly expensive. See the existing aggregate-related routines in numeric.c if you need some help with the C coding. regards, tom lane
Tom Lane wrote: > mlw <markw@mohawksoft.com> writes: > > I need to do some OLAP stuff, and I asked previously if there were a way > > to pass multiple parameters to an aggrigate function. i.e.: > > I looked through the code and it is non-trivial to do, > > Offhand I don't know of any fundamental reason why it couldn't be done, > but you're right that it'd take a fair amount of work. I understand the implications of the work, but it would be VERY cool to be able to do this for statistical stuff. > > > If I do this: > > select mycube(value1) as d1, dimention(value2) as d2, dimention(value3) > > as d3 group by value1; > > Can I safely assume the following: > > (1) mycube() will be called first > > (2) Assuming dimention() has no final func, that final func of mycube() > > will be called last. > > That might be true in the present code, but it strikes me as an awfully > risky set of assumptions. Also, it sounds like what you have in mind is > to have some hidden state that all the aggregate functions will access; > how then will you work if there are more than one set of these > aggregates being used in a query? What I was thinking is that I could use the state to hold a pointer returned by palloc. I don't think I can handle multiple mycube() calls, but short of reworking aggregates, I don't see any other way. > > If the needed parameters are all the same datatype, maybe you could put > them into an array and pass the array as a single argument to the > aggregate. How would you do this without having to make multiple SQL calls?
Tom Lane wrote: > > mlw <markw@mohawksoft.com> writes: > >> If the needed parameters are all the same datatype, maybe you could put > >> them into an array and pass the array as a single argument to the > >> aggregate. > > > How would you do this without having to make multiple SQL calls? > > I was thinking something like > > select my_aggregate(my_array_constructor(foo, bar, baz)) from ... > > where my_array_constructor is a quick hack C routine to build a > 3-element array from 3 input arguments (s/3/whatever you need/). > Someday we ought to have SQL syntax to build an array value from > a list of scalars, but in the meantime an auxiliary function is the > only way to do it. Interesting. Kind of ugly, but interesting. So, what would the order of operation be? I assume "my_array_constructor()" would be called first, and the return value then be passed to "my_aggregate()" along with the state value being set to the initial state, then subsequent calls to "my_array_constructor()", followed by "my_aggregate()" for each additional row in the group? I need to think about that. > > The overhead of constructing and then interpreting the temporary > array value is slightly annoying, but I don't think it'll be horribly > expensive. See the existing aggregate-related routines in numeric.c > if you need some help with the C coding. <postgres use story> I can do the C stuff, I have tons of C and C++ functions written for Postgres already, when I get the time to make them clean enough to contribute to the Postgres project, I will. (Text manipulation, search engine, date manipulation, xmcd, analysis functions, decode, and others) If you are interested in seeing a half Oracle, half Postgres site, take a look at http://www.dotclick.com. (You will need a Windows box) It is pretty evenly split between postgres and oracle. All "member" related data is on Oracle. All music related data is in Postgres. It has saved us probably $50K to $100 in Oracle database licenses and hardware to do it this way. We have three postgres boxes. One master, and two slaves. The master gets updated with new information from various sites. The program which does the updating, on the master, creates a SQL log script of everything it does. The script is then run against the slaves to maintain consistency. A web farm is split evenly between the two slaves. It is pretty cool. (As a side note, we are using Oracle for session management across a bunch of servers. Sadly we can not use postgres for this (we would love too), sessions are mostly updates and deletes, maybe when 7.2 comes out, but I'm still not sure about that.) </postgres use story>
mlw <markw@mohawksoft.com> writes: > So, what would the order of operation be? > I assume "my_array_constructor()" would be called first, and the return value > then be passed to "my_aggregate()" along with the state value being set to the > initial state, then subsequent calls to "my_array_constructor()", followed by > "my_aggregate()" for each additional row in the group? Check. regards, tom lane