Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support
Date
Msg-id 407d949e0912210213u39b857b0qa842885d715079e6@mail.gmail.com
Whole thread Raw
In response to Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
On Mon, Dec 21, 2009 at 5:48 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Information about count of rows are not detected in planner time. This
>  have to by available in any executor's sort node. So this value will
> be available every time - index scan is problem. Interesting is Greg's
> info about O(N) algorithms. Then we can implement median as classic
> aggregate.
>...
> On second hand - any internal implementation of median will be faster,
> then currently used techniques.

Some further information now that I'm on a real keyboard.

The O(n) algorithm for median of which I'm aware is Quickselect. It's
based on Quicksort which makes it unsuitable for a disk-based
algorithm since it would have to read every block many times. Perhaps
there's some way to adapt it or there might be some other O(n)
algorithm which has better i/o patterns.

But in cases where the tupleset/tuplesort is still in memory it would
be easy to add support for pulling out the nth element and use that in
a magic median() function. It wouldn't be a "classic" aggregate, at
least as I understand it where you also need something like O(1) space
to store the state, because you definitely need access to the whole
tupleset to do the quickselect.

If we don't find a way to optimize this for on-disk tuplestores though
then I wonder whether it's worth it. It would only help in the narrow
case that you had a large enough set to see the difference between
doing quickselect and quicksort but small enough to fit in workmem. I
suppose that case is widening over time though as memory sizes get
bigger and bigger.

--
greg


pgsql-hackers by date:

Previous
From: Hiroyuki Yamada
Date:
Subject: Re: alpha3 release schedule?
Next
From: Tim Bunce
Date:
Subject: Minimum perl version supported