Re: wip: functions median and percentile - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: wip: functions median and percentile
Date
Msg-id AANLkTinSbwJ9TJ_uOHR6VsQOg2yemd1cTstwqVB8096K@mail.gmail.com
Whole thread Raw
In response to Re: wip: functions median and percentile  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: wip: functions median and percentile
List pgsql-hackers
2010/10/5 Dean Rasheed <dean.a.rasheed@gmail.com>:
> On 5 October 2010 07:04, Hitoshi Harada <umi.tanuki@gmail.com> wrote:
> Extrapolating from few quick timing tests, even in the best case, on
> my machine, it would take 7 days for the running median to use up
> 100MB, and 8 years for it to use 2GB. So setting the tuplesort's
> workMem to 2GB (only in the running median case) would actually be
> safe in practice, and would prevent the temp file leak (for a few
> years at least!). I feel dirty even suggesting that. Better ideas
> anyone?

So, I suggested to implement median as a *pure* window function aside
from Pavel's aggregate function, and Greg suggested insertion
capability of tuplesort. By this approach, we keep tuplesort to hold
all the values in the current frame and can release it on the last of
a partition (it's possible by window function API.) This is
incremental addition of values and is far better than O(n^2 log(n))
although I didn't estimate the order. Only when the frame head is
moving down, we should re-initialize tuplesort and it is as slow as
calling aggregate version per each row (but I think we can solve it
somehow if looking precisely).

Regards,

-- 
Hitoshi Harada


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: O_DSYNC broken on MacOS X?
Next
From: Simon Riggs
Date:
Subject: Re: standby registration (was: is sync rep stalled?)