Home > mailing lists

Re: Aggregate Supporting Functions - Mailing list pgsql-hackers

From	Kevin Grittner
Subject	Re: Aggregate Supporting Functions
Date	June 9, 2015 18:43:06
Msg-id	528331008.8047210.1433864566500.JavaMail.yahoo@mail.yahoo.com Whole thread Raw
In response to	Re: Aggregate Supporting Functions (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Kevin Grittner <kgrittn@ymail.com> writes:
>> David Rowley <david.rowley@2ndquadrant.com> wrote:
>>> [ avoid duplicate calculations for related aggregates ]

>> From the information you have proposed storing, with cost factors
>> associated with the functions, it seems technically possible to
>> infer that you could run (for example) the avg() aggregate to
>> accumulate both but only run the final functions of the aggregates
>> referenced by the query.  That seems like an optimization to try
>> hard to forget about until you have at least one real-world use
>> case where it would yield a significant benefit.  It seems
>> premature to optimize for that before having the rest working.

> Actually, I would suggest that you forget about all the other aspects
> and *just* do that, because it could be made to work today on existing
> aggregate functions, and it would not require hundreds-to-thousands
> of lines of boilerplate support code in the grammar, catalog support,
> pg_dump, yadda yadda.  That is, look to see which aggregates use the
> same transition function and run that just once.

I was responding to David's suggestion that this particular query
be optimized by using the transition function from avg():
 SELECT sum(x), count(x) from bigtable;

Reviewing what you said caused me to notice something that I had
missed before -- that sum() and avg() share a transition function.
Still, that function is not used for count(), so I don't see how
that fits into what you're saying above.

I agree that what you're suggesting does allow access to some very
low-hanging fruit that I had not noticed; it makes sense to get
that first.

> The rest of what David is thinking about could be done in a followon
> version by allowing the same aggregate to be implemented by any of several
> transition-function/final-function pairs, then teaching the planner to
> prefer pairs that let the same transition function be used for several
> aggregates.  But I'd see that as a later refinement that might well fail
> the bang-for-buck test, and hence shouldn't be the first step.

Well, that part will be a little tricky, because it would be
infrastructure which would allow what are likely significant
optimizations in several other features.  There's a bit of a
chicken-and-egg problem, because these other optimizations can't be
written without the infrastructure, and the infrastructure will not
show its full worth until the other optimizations are able to take
advantage of it.  But we can cross that bridge when we come to it.

This doesn't look to me like the traditional 80/20 rule.  I think
the easy stuff might be 5% of the benefit for 1% of the work; but
still a better bang for the buck than the other work.

What you are proposing as an alternative to what David proposed for
the later work looks (on the face of it) like a bigger, more
complicated mechanism than he proposed, but more flexible if it can
be made to work.  What I'd hate to see is failure to get a toaster 
because it's too expensive to get one that also bakes pizza.  We're 
gonna want to make a lot of toast over the next few years.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Andres Freund
Date: 09 June 2015, 18:37:26
Subject: Re: "could not adopt C locale" failure at startup on Windows

From: Tomas Vondra
Date: 09 June 2015, 19:20:13
Subject: Re: The Future of Aggregation

Re: Aggregate Supporting Functions - Mailing list pgsql-hackers

Previous

Next