Re: Aggregate Supporting Functions - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Aggregate Supporting Functions
Date
Msg-id 528331008.8047210.1433864566500.JavaMail.yahoo@mail.yahoo.com
Whole thread Raw
In response to Re: Aggregate Supporting Functions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> David Rowley <david.rowley@2ndquadrant.com> wrote:
>>> [ avoid duplicate calculations for related aggregates ]

>> From the information you have proposed storing, with cost factors
>> associated with the functions, it seems technically possible to
>> infer that you could run (for example) the avg() aggregate to
>> accumulate both but only run the final functions of the aggregates
>> referenced by the query.  That seems like an optimization to try
>> hard to forget about until you have at least one real-world use
>> case where it would yield a significant benefit.  It seems
>> premature to optimize for that before having the rest working.

> Actually, I would suggest that you forget about all the other aspects
> and *just* do that, because it could be made to work today on existing
> aggregate functions, and it would not require hundreds-to-thousands
> of lines of boilerplate support code in the grammar, catalog support,
> pg_dump, yadda yadda.  That is, look to see which aggregates use the
> same transition function and run that just once.

I was responding to David's suggestion that this particular query
be optimized by using the transition function from avg():
 SELECT sum(x), count(x) from bigtable;

Reviewing what you said caused me to notice something that I had
missed before -- that sum() and avg() share a transition function.
Still, that function is not used for count(), so I don't see how
that fits into what you're saying above.

I agree that what you're suggesting does allow access to some very
low-hanging fruit that I had not noticed; it makes sense to get
that first.

> The rest of what David is thinking about could be done in a followon
> version by allowing the same aggregate to be implemented by any of several
> transition-function/final-function pairs, then teaching the planner to
> prefer pairs that let the same transition function be used for several
> aggregates.  But I'd see that as a later refinement that might well fail
> the bang-for-buck test, and hence shouldn't be the first step.

Well, that part will be a little tricky, because it would be
infrastructure which would allow what are likely significant
optimizations in several other features.  There's a bit of a
chicken-and-egg problem, because these other optimizations can't be
written without the infrastructure, and the infrastructure will not
show its full worth until the other optimizations are able to take
advantage of it.  But we can cross that bridge when we come to it.

This doesn't look to me like the traditional 80/20 rule.  I think
the easy stuff might be 5% of the benefit for 1% of the work; but
still a better bang for the buck than the other work.

What you are proposing as an alternative to what David proposed for
the later work looks (on the face of it) like a bigger, more
complicated mechanism than he proposed, but more flexible if it can
be made to work.  What I'd hate to see is failure to get a toaster 
because it's too expensive to get one that also bakes pizza.  We're 
gonna want to make a lot of toast over the next few years.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: "could not adopt C locale" failure at startup on Windows
Next
From: Tomas Vondra
Date:
Subject: Re: The Future of Aggregation