Re: wip: functions median and percentile - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: wip: functions median and percentile |
Date | |
Msg-id | AANLkTingNyxYVhn_Hi9Wbs4TJvetmiKNUjambgQ2ppLj@mail.gmail.com Whole thread Raw |
In response to | Re: wip: functions median and percentile (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: wip: functions median and percentile
|
List | pgsql-hackers |
2010/10/11 Dean Rasheed <dean.a.rasheed@gmail.com>: > On 10 October 2010 22:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It was pointed out upthread that while median isn't presently >> in the standard, Oracle defines it in terms of percentile_cont(0.5) >> which *is* in the standard. What I read in SQL:2008 is that >> percentile_cont is defined for all numeric types (returning >> approximate numeric with implementation-defined precision), >> and for interval (returning interval), and not for any other >> input type. So it appears to me that what we ought to support >> is >> median(float8) returns float8 >> median(interval) returns interval >> and nothing else --- we can rely on implicit casting to convert >> any other numeric input type to float8. >> > > Yeah that would be much simpler. > > BTW, why has percentile been removed from this patch? As the more > general, and SQL standard function, that would seem to be the more > useful one to include. Upthread it was mentioned that there is already > an ntile window function, but actually that's a completely different > thing. The reason for removing was impossibility to specify so some parameter must by immutable - in this case p parameter should be immutable otherwise the result is undefined. Regards Pavel Stehule > > >> BTW, as far as the implementation issues go, telling tuplesort that it >> can use gigabytes of memory no matter what seems quite unacceptable. >> Put this thing into a hash aggregation and you'll blow out your memory >> in no time. I don't think it's even a good idea to use work_mem there. > > Argh! Yes that sounds like a much more serious problem. > > Interestingly I couldn't seem to produce this effect. Every effort I > make to write a query to test this with median ends up being executed > using a GroupAggregate, while the equivalent query with avg uses a > HashAggregate. I don't understand why they are being treated > differently. > > >> I wonder whether it'd be a good idea to augment AggCheckCallContext() >> so that there's a way for aggregates to find out how much memory they >> ought to try to use. In a simple aggregation situation it's probably >> OK to use work_mem, but in a hash aggregation you'd better use less >> --- perhaps work_mem divided by the number of groups expected. > > Wouldn't that risk not allowing any memory at all the to aggregate in > some cases? I don't have a better idea mind you, short of somehow not > allowing hash aggregation for this function. > > >> Also, I believe that the lack-of-cleanup problem for tuplesorts spilling >> to disk should be fixable by using an exprcontext shutdown callback (see >> RegisterExprContextCallback). > > Ah! I wasn't aware of such a callback. Sounds perfect for the job. > > Regards, > Dean > > >> >> Comments? >> >> regards, tom lane >> >
pgsql-hackers by date: