Thread: OLAP, Aggregates, and order of operations

OLAP, Aggregates, and order of operations

From
mlw
Date:
I need to do some OLAP stuff, and I asked previously if there were a way
to pass multiple parameters to an aggrigate function. i.e.:

select mycube(value1, value2, value3) from table group by value1;

I looked through the code and it is non-trivial to do, one would have to
alter the grammar to include a number of parameters, I guess something
like this:

create aggregate (sfunc = myfunct, sfuncnargs=3, stype = int4, basetype1
= int4, basetype2 = int4, ....);

Then change the catalog, and the execution, arrg!

(God I wish I could spend the time I want on PostgreSQL! )

Anyway, short of that....

If I do this:

select mycube(value1) as d1, dimention(value2) as d2, dimention(value3)
as d3 group by value1;

Can I safely assume the following:

(1) mycube() will be called first
(2) Assuming dimention() has no final func, that final func of mycube()
will be called last.





Re: OLAP, Aggregates, and order of operations

From
Tom Lane
Date:
mlw <markw@mohawksoft.com> writes:
> I need to do some OLAP stuff, and I asked previously if there were a way
> to pass multiple parameters to an aggrigate function. i.e.:
> I looked through the code and it is non-trivial to do,

Offhand I don't know of any fundamental reason why it couldn't be done,
but you're right that it'd take a fair amount of work.

> If I do this:
> select mycube(value1) as d1, dimention(value2) as d2, dimention(value3)
> as d3 group by value1;
> Can I safely assume the following:
> (1) mycube() will be called first
> (2) Assuming dimention() has no final func, that final func of mycube()
> will be called last.

That might be true in the present code, but it strikes me as an awfully
risky set of assumptions.  Also, it sounds like what you have in mind is
to have some hidden state that all the aggregate functions will access;
how then will you work if there are more than one set of these
aggregates being used in a query?

If the needed parameters are all the same datatype, maybe you could put
them into an array and pass the array as a single argument to the
aggregate.
        regards, tom lane


Re: OLAP, Aggregates, and order of operations

From
Tom Lane
Date:
mlw <markw@mohawksoft.com> writes:
>> If the needed parameters are all the same datatype, maybe you could put
>> them into an array and pass the array as a single argument to the
>> aggregate.

> How would you do this without having to make multiple SQL calls?

I was thinking something like
   select my_aggregate(my_array_constructor(foo, bar, baz)) from ...

where my_array_constructor is a quick hack C routine to build a
3-element array from 3 input arguments (s/3/whatever you need/).
Someday we ought to have SQL syntax to build an array value from
a list of scalars, but in the meantime an auxiliary function is the
only way to do it.

The overhead of constructing and then interpreting the temporary
array value is slightly annoying, but I don't think it'll be horribly
expensive.  See the existing aggregate-related routines in numeric.c
if you need some help with the C coding.
        regards, tom lane


Re: OLAP, Aggregates, and order of operations

From
mlw
Date:
Tom Lane wrote:

> mlw <markw@mohawksoft.com> writes:
> > I need to do some OLAP stuff, and I asked previously if there were a way
> > to pass multiple parameters to an aggrigate function. i.e.:
> > I looked through the code and it is non-trivial to do,
>
> Offhand I don't know of any fundamental reason why it couldn't be done,
> but you're right that it'd take a fair amount of work.

I understand the implications of the work, but it would be VERY cool to be
able to do this for statistical stuff.

>
> > If I do this:
> > select mycube(value1) as d1, dimention(value2) as d2, dimention(value3)
> > as d3 group by value1;
> > Can I safely assume the following:
> > (1) mycube() will be called first
> > (2) Assuming dimention() has no final func, that final func of mycube()
> > will be called last.
>
> That might be true in the present code, but it strikes me as an awfully
> risky set of assumptions.  Also, it sounds like what you have in mind is
> to have some hidden state that all the aggregate functions will access;
> how then will you work if there are more than one set of these
> aggregates being used in a query?

What I was thinking is that I could use the state to hold a pointer returned
by palloc. I don't think I can handle multiple mycube() calls, but short of
reworking aggregates, I don't see any other way.

>
> If the needed parameters are all the same datatype, maybe you could put
> them into an array and pass the array as a single argument to the
> aggregate.

How would you do this without having to make multiple SQL calls?



Re: OLAP, Aggregates, and order of operations

From
mlw
Date:
Tom Lane wrote:
> 
> mlw <markw@mohawksoft.com> writes:
> >> If the needed parameters are all the same datatype, maybe you could put
> >> them into an array and pass the array as a single argument to the
> >> aggregate.
> 
> > How would you do this without having to make multiple SQL calls?
> 
> I was thinking something like
> 
>     select my_aggregate(my_array_constructor(foo, bar, baz)) from ...
> 
> where my_array_constructor is a quick hack C routine to build a
> 3-element array from 3 input arguments (s/3/whatever you need/).
> Someday we ought to have SQL syntax to build an array value from
> a list of scalars, but in the meantime an auxiliary function is the
> only way to do it.

Interesting. Kind of ugly, but interesting. 

So, what would the order of operation be?

I assume "my_array_constructor()" would be called first, and the return value
then be passed to "my_aggregate()" along with the state value being set to the
initial state, then subsequent calls to "my_array_constructor()", followed by
"my_aggregate()" for each additional row in the group?

I need to think about that.

> 
> The overhead of constructing and then interpreting the temporary
> array value is slightly annoying, but I don't think it'll be horribly
> expensive.  See the existing aggregate-related routines in numeric.c
> if you need some help with the C coding.

<postgres use story>
I can do the C stuff, I have tons of C and C++ functions written for Postgres
already, when I get the time to make them clean enough to contribute to the
Postgres project, I will. (Text manipulation, search engine, date manipulation,
xmcd, analysis functions, decode, and others) If you are interested in seeing a
half Oracle, half Postgres site, take a look at http://www.dotclick.com. (You
will need a Windows box)

It is pretty evenly split between postgres and oracle. All "member" related
data is on Oracle. All music related data is in Postgres. It has saved us
probably $50K to $100 in Oracle database licenses and hardware to do it this
way.

We have three postgres boxes. One master, and two slaves. The master gets
updated with new information from various sites. The program which does the
updating, on the master, creates a SQL log script of everything it does. The
script is then run against the slaves to maintain consistency. A web farm is
split evenly between the two slaves.

It is pretty cool. 

(As a side note, we are using Oracle for session management across a bunch of
servers. Sadly we can not use postgres for this (we would love too), sessions
are mostly updates and deletes, maybe when 7.2 comes out, but I'm still not
sure about that.)

</postgres use story>


Re: Re: OLAP, Aggregates, and order of operations

From
Tom Lane
Date:
mlw <markw@mohawksoft.com> writes:
> So, what would the order of operation be?

> I assume "my_array_constructor()" would be called first, and the return value
> then be passed to "my_aggregate()" along with the state value being set to the
> initial state, then subsequent calls to "my_array_constructor()", followed by
> "my_aggregate()" for each additional row in the group?

Check.
        regards, tom lane