Home > mailing lists

Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support
Date	December 21, 2009 06:13:32
Msg-id	407d949e0912210213u39b857b0qa842885d715079e6@mail.gmail.com Whole thread Raw
In response to	Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support (Pavel Stehule <pavel.stehule@gmail.com>)
Responses	Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support
List	pgsql-hackers

Tree view

On Mon, Dec 21, 2009 at 5:48 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> Information about count of rows are not detected in planner time. This
>  have to by available in any executor's sort node. So this value will
> be available every time - index scan is problem. Interesting is Greg's
> info about O(N) algorithms. Then we can implement median as classic
> aggregate.
>...
> On second hand - any internal implementation of median will be faster,
> then currently used techniques.

Some further information now that I'm on a real keyboard.

The O(n) algorithm for median of which I'm aware is Quickselect. It's
based on Quicksort which makes it unsuitable for a disk-based
algorithm since it would have to read every block many times. Perhaps
there's some way to adapt it or there might be some other O(n)
algorithm which has better i/o patterns.

But in cases where the tupleset/tuplesort is still in memory it would
be easy to add support for pulling out the nth element and use that in
a magic median() function. It wouldn't be a "classic" aggregate, at
least as I understand it where you also need something like O(1) space
to store the state, because you definitely need access to the whole
tupleset to do the quickselect.

If we don't find a way to optimize this for on-disk tuplestores though
then I wonder whether it's worth it. It would only help in the narrow
case that you had a large enough set to see the difference between
doing quickselect and quicksort but small enough to fit in workmem. I
suppose that case is widening over time though as memory sizes get
bigger and bigger.

--
greg

pgsql-hackers by date:

From: Hiroyuki Yamada
Date: 21 December 2009, 05:42:40
Subject: Re: alpha3 release schedule?

From: Tim Bunce
Date: 21 December 2009, 06:46:11
Subject: Minimum perl version supported

Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support - Mailing list pgsql-hackers

Previous

Next