Re: Combining Aggregates - Mailing list pgsql-hackers

From Atri Sharma
Subject Re: Combining Aggregates
Date
Msg-id CAOeZVid3R6SV7R2EFvK36YzWMEU3g5rYJKAUNQqKcP3crTFMew@mail.gmail.com
Whole thread Raw
In response to Re: Combining Aggregates  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses Re: Combining Aggregates  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List pgsql-hackers


On Wed, Dec 17, 2014 at 6:05 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
Simon,

Its concept is good to me. I think, the new combined function should be
responsible to take a state data type as argument and update state object
of the aggregate function. In other words, combined function performs like
transition function but can update state object according to the summary
of multiple rows. Right?

It also needs some enhancement around Aggref/AggrefExprState structure to
inform which function shall be called on execution time.
Combined functions are usually no-thank-you. AggrefExprState updates its
internal state using transition function row-by-row. However, once someone
push-down aggregate function across table joins, combined functions have
to be called instead of transition functions.
I'd like to suggest Aggref has a new flag to introduce this aggregate expects
state object instead of scalar value.

Also, I'd like to suggest one other flag in Aggref not to generate final
result, and returns state object instead.



So are you proposing not calling transfuncs at all and just use combined functions?

That sounds counterintuitive to me. I am not able to see why you would want to avoid transfns totally even for the case of pushing down aggregates that you mentioned. 

From Simon's example mentioned upthread:

PRE-AGGREGATED PLAN
Aggregate
-> Join
     -> PreAggregate (doesn't call finalfn)
          -> Scan BaseTable1
     -> Scan BaseTable2

finalfn wouldnt be called. Instead, combined function would be responsible for getting preaggregate results and combining them (unless of course, I am missing something).

Special casing transition state updating in Aggref seems like a bad idea to me. I would think that it would be better if we made it more explicit i.e. add a new node on top that does the combination (it would be primarily responsible for calling combined function).

Not a good source of inspiration, but seeing how SQL Server does it (Exchange operator + Stream Aggregate) seems intuitive to me, and having combination operation as a separate top node might be a cleaner way.

I may be wrong though.

Regards,

Atri

pgsql-hackers by date:

Previous
From: Martín Marqués
Date:
Subject: postgres messages error
Next
From: Andrew Dunstan
Date:
Subject: Re: POLA violation with \c service=