Re: wip: functions median and percentile - Mailing list pgsql-hackers
From | David Fetter |
---|---|
Subject | Re: wip: functions median and percentile |
Date | |
Msg-id | 20100923184847.GC19104@fetter.org Whole thread Raw |
In response to | Re: wip: functions median and percentile (Pavel Stehule <pavel.stehule@gmail.com>) |
Responses |
Re: wip: functions median and percentile
|
List | pgsql-hackers |
On Thu, Sep 23, 2010 at 08:27:38PM +0200, Pavel Stehule wrote: > 2010/9/23 Hitoshi Harada <umi.tanuki@gmail.com>: > > 2010/9/23 Pavel Stehule <pavel.stehule@gmail.com>: > >> Hello > >> > >> 2010/9/22 Hitoshi Harada <umi.tanuki@gmail.com>: > >>> 2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>: > >>>> Hello > >>>> > >>>> I found probably hard problem in cooperation with window functions :( > >> > >> maybe I was confused. I found a other possible problems. > >> > >> The problem with median function is probably inside a final function > >> implementation. Actually we request possibility of repetitive call of > >> final function. But final function call tuplesort_end function and > >> tuplesort_performsort. These function changes a state of tuplesort. > >> The most basic question is "who has to call tuplesort_end function and > >> when? > > > > Reading the comment in array_userfuncs.c, array_agg_finalfn() doesn't > > clean up its internal state at all and tells it's the executor's > > responsibility to clear memory. It is allowed since ArrayBuildState is > > only in-memory state. In the other hand, TupleSort should be cleared > > by calling tuplesort_end() if it has tapeset member (on file based > > sort) to close physical files. > > > > So 2 or 3 ways to go in my mind: > > it is little bit worse - we cannot to call tuplesort_performsort repetitive. > > > > > 1. call tuplesort_begin_datum with INT_MAX workMem rather than the > > global work_mem, to avoid it spills out sort state to files. It may > > sounds dangerous, but actually memory exhausting can happen in > > array_agg() as well. > > > > 2. add TupleSort an argument that tells not to use file at all. This > > results in the same as #1 but more generic approach. > > > > 3. don't use tuplesort in median() but implement its original sort > > management. This looks quite redundant and like maintenance problem. > > > > #2 sounds like the best in generic and consistent way. The only point > > is whether the change is worth for implementing median() as it's very > > system-wide common fundamentals. > > > > Other options? > > #4 block median under window clause > > #5 use a C array instead tuplesort under window clause. It is very > unpractical to use a windows clauses with large datasets, so it should > not be a problem. More, this can be very quick, because for C array we > can use a qsort function. > > Now I prefer #5 - it can be fast for using inside windows clause and > safe when window clause will not be used. If there's some way to do this using the same code in the windowing and non-windowing case, that would be much, much better from an architectural point of view. Single Point of Truth and all that. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
pgsql-hackers by date: