Re: wip: functions median and percentile - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: wip: functions median and percentile
Date
Msg-id AANLkTik=uxdBr2NRf1iL5R-0jLJzafRY3pDvke1txLAU@mail.gmail.com
Whole thread Raw
In response to Re: wip: functions median and percentile  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
2010/10/14 Robert Haas <robertmhaas@gmail.com>:
> On Wed, Oct 13, 2010 at 6:56 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> 2010/10/13 Pavel Stehule <pavel.stehule@gmail.com>:
>>> 2010/10/13 Peter Eisentraut <peter_e@gmx.net>:
>>>> On mån, 2010-10-11 at 20:46 +0200, Pavel Stehule wrote:
>>>>> The problem is in interface. The original patch did it, but I removed
>>>>> it. We cannot to unsure immutability of some parameters now.
>>>>
>>>> How about you store the "immutable" parameter in the transition state
>>>> and error out if it changes between calls?
>>>>
>>>
>>> yes, it's possibility. Now I looking there and I see the as more
>>> important problem the conformance with ANSI SQL. see my last post.
>>> There can be a kind of aggregate functions based on tuplesort.
>>
>> more - all these functions needs to solve same problem with planner
>> and hash agg. So maybe is time to add a flag ISTUPLESORTED to pg_proc
>> and add solve these functions together.
>
> I think that the design of this patch is still sufficiently up in the
> air that it is not going to be practical to get it committed during
> the current CommitFest, which is nearly over, so I am going to mark it
> as Returned with Feedback.  I suggest that we continue discussing it,
> however, so that we can get things squared away for the next
> CommitFest.  It seems that the fundamental question here is whether
> median is an instance of some more general problem, or whether it's a
> special case; and more importantly, if it is an instance of a more
> general problem, what is the shape of that general problem?

+1

Median implemented as special case of some special sort of functions
will be better. The use case of ANSI SQL functions are more general -
but it needs discussion about design. I didn't find too much
consistency in standard. These functions are defined individually -
not as some special kind of functions. All functions from standard has
a immutable parameters - but Oracle listagg function has one parameter
mutable and second immutable.

More we should better to solve using these functions together with
window clause. I didn't find any note about using combination in
standard, but minimally Oracle allows a within_group and over clauses
together.

On second hand - this work can be really useful. We can get a bigger
conformity with ANSI SQL 200x and with other db - DB2, Oracle, MSSQL,
Sybase support this feature.

>
> Or to put it more bluntly - what is the "problem with planner and hash
> agg" that all of these functions need to solve?  And why does it need
> a flag in pg_proc?  Why can't't we leave it to the individual
> functions to perform a sort of one is needed?

These functions are hungry - It takes a 30 kb (minimum tuplesort) per
group. More there is relative general pattern, that can be shared -
there can be minimaly 6 functions, that just fill tuplesort in
iterations - so these code can be shared, tuplesort can be reseted and
used respectively. And it's question if requested sort can be used in
outer optimalizations. Primary thing for solving is memory usage.

Regards

Pavel Stehule


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: string function - "format" function proposal
Next
From: Robert Haas
Date:
Subject: Re: string function - "format" function proposal