Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support
Date
Msg-id 87zl5de1vc.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support  (Greg Stark <gsstark@mit.edu>)
Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2009/12/20 Tom Lane <tgl@sss.pgh.pa.us>:>> I think that we've already expanded the capabilities of aggregates>> a
greatdeal for 8.5, and we should let it sit as-is for a release>> or two and see what the real user demand is for
additional>>features.>> >> I'm particularly concerned by the fact that the feature set is>> already far out in front of
whatthe planner can optimize>> effectively (e.g., there's no ability to combine the work when>> multiple aggregates
needthe same sorted data).  The more features>> we add on speculation, the harder it's going to be to close that>> gap. 

I absolutely agree with Tom here and for some quite specific reasons.

An optimal (or at least more optimal than is currently possible)
implementation of median() on top of the ordered-agg code as it stands
requires additions to the aggregate function interface: the median agg
implementation would have to, as a minimum, know how many rows of
sorted input are available. In addition, it would be desirable for it
to have direct (and possibly bidirectional) access to the tuplesort.

Now, if we look at how ordered aggs ought to be optimized, it's clear
that the planner should take the ordering costs into account and
consider plans that order the input instead. Once you do this, then
there's no longer any pre-computed count or tuplesort object
available, so if you'd implemented a better median() before the
optimizations, you'd end up either having to forgo the optimization or
break the median() agg again; clearly not something we want.

Plus, a feature that I intentionally omitted from the ordered-aggs
patch is the ability to do ordered-aggs as window functions. This is
in the spec, but (a) there were conflicting patches for window
functions on the table and (b) in my opinion, much of the work needed
to implement ordered-agg-as-window-func in an effective manner is
dependent on doing more work on optimization first (or at least will
potentially become easier as a result of that work).

So I think both the optimization issue and the window functions issue
would be best addressed before trying to build any additional features
on top of what we have so far.

--
Andrew (irc:RhodiumToad)


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: fdw validation function vs zero catalog id
Next
From: Tom Lane
Date:
Subject: Re: creating index names automatically?