Thread: proposal: tuplestore, tuplesort aggregate functions

proposal: tuplestore, tuplesort aggregate functions

From
Pavel Stehule
Date:
Hello

I still thinking about a "median" type functions. My idea is to
introduce a new syntax for stype definition - like

stype = type, or
stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
stype = TUPLESTORE OF type, or
stype = TUPLESORT OF type [ DESC | ASC ]

when stype is ARRAY of then final and transistent functions can be a
PL functions. When stype isn't scalar, then sfunc can be undefined (it
use a buildin functions). Then we can implement a aggregate only with
final functions.

so median function can be defined:

CREATE FUNCTION num_median_final(internal) RETURNS numeric AS ...
CREATE AGGREGATE median(numeric) (stype = TUPLESORT OF numeric,
finalfunc = num_median_final);

This feature has impact primary on agg executor, and can be relative
simple - no planner changes (or not big), minimal parser changes.

Main reason for this feature is possible access to tuplesort and
tuplesort. I hope, so this can solve a problems with computing a
median and similar functions on very large datasets.

comments?

regards


Re: proposal: tuplestore, tuplesort aggregate functions

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> I still thinking about a "median" type functions. My idea is to
> introduce a new syntax for stype definition - like

> stype = type, or
> stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
> stype = TUPLESTORE OF type, or
> stype = TUPLESORT OF type [ DESC | ASC ]

This seems like a fairly enormous amount of conceptual (and code)
infrastructure just to make it possible to build median() out of spare
parts.  It's also exposing some implementation details that I'd just as
soon not expose in SQL.  I'd rather just implement median as a
special-purpose aggregate.
        regards, tom lane


Re: proposal: tuplestore, tuplesort aggregate functions

From
Pavel Stehule
Date:
2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> I still thinking about a "median" type functions. My idea is to
>> introduce a new syntax for stype definition - like
>
>> stype = type, or
>> stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
>> stype = TUPLESTORE OF type, or
>> stype = TUPLESORT OF type [ DESC | ASC ]
>
> This seems like a fairly enormous amount of conceptual (and code)
> infrastructure just to make it possible to build median() out of spare
> parts.  It's also exposing some implementation details that I'd just as
> soon not expose in SQL.  I'd rather just implement median as a
> special-purpose aggregate.

yes, it is little bit strange - but when we talked last time about
this topic, I understand, so you dislike any special solution for this
functionality. So I searched different more general way. On the other
hand, I agree so special purpose aggregate (with a few changes in
nodeAgg) can be enough. The median (and additional forms) is really
special and there are not wide used use case.

Regards

Pavel


>
>                        regards, tom lane
>


Re: proposal: tuplestore, tuplesort aggregate functions

From
David Fetter
Date:
On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote:
> 2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:
> > Pavel Stehule <pavel.stehule@gmail.com> writes:
> >> I still thinking about a "median" type functions. My idea is to
> >> introduce a new syntax for stype definition - like
> >
> >> stype = type, or
> >> stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
> >> stype = TUPLESTORE OF type, or
> >> stype = TUPLESORT OF type [ DESC | ASC ]
> >
> > This seems like a fairly enormous amount of conceptual (and code)
> > infrastructure just to make it possible to build median() out of
> > spare parts.  It's also exposing some implementation details that
> > I'd just as soon not expose in SQL.  I'd rather just implement
> > median as a special-purpose aggregate.
> 
> yes, it is little bit strange - but when we talked last time about
> this topic, I understand, so you dislike any special solution for
> this functionality. So I searched different more general way. On the
> other hand, I agree so special purpose aggregate (with a few changes
> in nodeAgg) can be enough. The median (and additional forms) is
> really special and there are not wide used use case.

Which median do you plan to implement?  Or do you plan to implement
several different medians, each with distinguishing names?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: proposal: tuplestore, tuplesort aggregate functions

From
Pavel Stehule
Date:
2010/8/18 David Fetter <david@fetter.org>:
> On Wed, Aug 18, 2010 at 04:03:25PM +0200, Pavel Stehule wrote:
>> 2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:
>> > Pavel Stehule <pavel.stehule@gmail.com> writes:
>> >> I still thinking about a "median" type functions. My idea is to
>> >> introduce a new syntax for stype definition - like
>> >
>> >> stype = type, or
>> >> stype = ARRAY OF type [ ORDER [ DESC | ASC ]], or
>> >> stype = TUPLESTORE OF type, or
>> >> stype = TUPLESORT OF type [ DESC | ASC ]
>> >
>> > This seems like a fairly enormous amount of conceptual (and code)
>> > infrastructure just to make it possible to build median() out of
>> > spare parts.  It's also exposing some implementation details that
>> > I'd just as soon not expose in SQL.  I'd rather just implement
>> > median as a special-purpose aggregate.
>>
>> yes, it is little bit strange - but when we talked last time about
>> this topic, I understand, so you dislike any special solution for
>> this functionality. So I searched different more general way. On the
>> other hand, I agree so special purpose aggregate (with a few changes
>> in nodeAgg) can be enough. The median (and additional forms) is
>> really special and there are not wide used use case.
>
> Which median do you plan to implement?  Or do you plan to implement
> several different medians, each with distinguishing names?

my proposal enabled implementation of any "median like" function. But
if we implement median as special case of aggregate, then some basic
"median" will be implemented.

Regards

Pavel

>
> Cheers,
> David.
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
> iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>


Re: proposal: tuplestore, tuplesort aggregate functions

From
David Fetter
Date:
On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote:
> 2010/8/18 David Fetter <david@fetter.org>:
> > Which median do you plan to implement?  Or do you plan to implement
> > several different medians, each with distinguishing names?
> 
> my proposal enabled implementation of any "median like" function. But
> if we implement median as special case of aggregate, then some basic
> "median" will be implemented.

Apart from the medians, which "median-like" aggregates do you have in
mind to start with?  If you can provide examples of "median-like"
aggregates that people might need to implement as user-defined
aggregates, or other places where people would use this machinery, it
will make your case stronger for this refactoring.

Otherwise, it seems like a more reasonable thing to make the medians
special case code.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: proposal: tuplestore, tuplesort aggregate functions

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> Apart from the medians, which "median-like" aggregates do you have in
> mind to start with?  If you can provide examples of "median-like"
> aggregates that people might need to implement as user-defined
> aggregates, or other places where people would use this machinery, it
> will make your case stronger for this refactoring.

There would be plenty of scope to re-use the machinery without any
SQL-level extensions.  All you need is a polymorphic aggregate
transition function that maintains a tuplestore or whatever.
I don't see that extra syntax in CREATE AGGREGATE is really buying
much of anything.
        regards, tom lane


Re: proposal: tuplestore, tuplesort aggregate functions

From
Pavel Stehule
Date:
2010/8/18 David Fetter <david@fetter.org>:
> On Wed, Aug 18, 2010 at 04:10:18PM +0200, Pavel Stehule wrote:
>> 2010/8/18 David Fetter <david@fetter.org>:
>> > Which median do you plan to implement?  Or do you plan to implement
>> > several different medians, each with distinguishing names?
>>
>> my proposal enabled implementation of any "median like" function. But
>> if we implement median as special case of aggregate, then some basic
>> "median" will be implemented.
>
> Apart from the medians, which "median-like" aggregates do you have in
> mind to start with?  If you can provide examples of "median-like"
> aggregates that people might need to implement as user-defined
> aggregates, or other places where people would use this machinery, it
> will make your case stronger for this refactoring.
>

I didn't think about some special median - this proposal is just about
aggregates with large a transistent data. Then the access to
tuplestore can be very usefull.

> Otherwise, it seems like a more reasonable thing to make the medians
> special case code.

yes, minimally for this moment.

Regards

Pavel

>
> Cheers,
> David.
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
> iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>


Re: proposal: tuplestore, tuplesort aggregate functions

From
David Fetter
Date:
On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > Apart from the medians, which "median-like" aggregates do you have in
> > mind to start with?  If you can provide examples of "median-like"
> > aggregates that people might need to implement as user-defined
> > aggregates, or other places where people would use this machinery, it
> > will make your case stronger for this refactoring.
> 
> There would be plenty of scope to re-use the machinery without any
> SQL-level extensions.  All you need is a polymorphic aggregate
> transition function that maintains a tuplestore or whatever.
> I don't see that extra syntax in CREATE AGGREGATE is really buying
> much of anything.

Thanks for clarifying.  Might this help out with things like GROUPING
SETS or wCTEs?

Cheers,
David (a little slow today).
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: proposal: tuplestore, tuplesort aggregate functions

From
Pavel Stehule
Date:
2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:
> David Fetter <david@fetter.org> writes:
>> Apart from the medians, which "median-like" aggregates do you have in
>> mind to start with?  If you can provide examples of "median-like"
>> aggregates that people might need to implement as user-defined
>> aggregates, or other places where people would use this machinery, it
>> will make your case stronger for this refactoring.
>
> There would be plenty of scope to re-use the machinery without any
> SQL-level extensions.  All you need is a polymorphic aggregate
> transition function that maintains a tuplestore or whatever.
> I don't see that extra syntax in CREATE AGGREGATE is really buying
> much of anything.
>

Have we to use a transisdent function? If we implement median as
special variant of aggregate - because we need to push an sort, then
we can skip a transident function function - and call directly final
function. This mechanism is used for aggregates with ORDER BY now. So
there can be a special path for direct call of final func. There is
useles to call transident function.

Regards

Pavel




>                        regards, tom lane
>


Re: proposal: tuplestore, tuplesort aggregate functions

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Wed, Aug 18, 2010 at 10:39:33AM -0400, Tom Lane wrote:
>> There would be plenty of scope to re-use the machinery without any
>> SQL-level extensions.  All you need is a polymorphic aggregate
>> transition function that maintains a tuplestore or whatever.
>> I don't see that extra syntax in CREATE AGGREGATE is really buying
>> much of anything.

> Thanks for clarifying.  Might this help out with things like GROUPING
> SETS or wCTEs?

Don't see how --- this is just about what you can do within an aggregate.
        regards, tom lane


Re: proposal: tuplestore, tuplesort aggregate functions

From
David Fetter
Date:
On Wed, Aug 18, 2010 at 04:46:57PM +0200, Pavel Stehule wrote:
> 2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:
> > David Fetter <david@fetter.org> writes:
> >> Apart from the medians, which "median-like" aggregates do you
> >> have in mind to start with?  If you can provide examples of
> >> "median-like" aggregates that people might need to implement as
> >> user-defined aggregates, or other places where people would use
> >> this machinery, it will make your case stronger for this
> >> refactoring.
> >
> > There would be plenty of scope to re-use the machinery without any
> > SQL-level extensions.  All you need is a polymorphic aggregate
> > transition function that maintains a tuplestore or whatever.  I
> > don't see that extra syntax in CREATE AGGREGATE is really buying
> > much of anything.
> >
> 
> Have we to use a transisdent function? If we implement median as
> special variant of aggregate - because we need to push an sort, then
> we can skip a transident function function - and call directly final
> function. This mechanism is used for aggregates with ORDER BY now.
> So there can be a special path for direct call of final func. There
> is useles to call transident function.

Just a wacky idea here.  Could we make a special state transition
function called IDENTITY or some such that would turn into a noop?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: proposal: tuplestore, tuplesort aggregate functions

From
Tom Lane
Date:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> 2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:
>> There would be plenty of scope to re-use the machinery without any
>> SQL-level extensions.  All you need is a polymorphic aggregate
>> transition function that maintains a tuplestore or whatever.

> Have we to use a transisdent function? If we implement median as
> special variant of aggregate - because we need to push an sort, then
> we can skip a transident function function - and call directly final
> function.

Well, that would require a whole bunch of *other* mechanisms, which you
weren't saying anything about in your original proposal.  But driving
it off the transtype declaration would be quite inappropriate anyway IMO.
        regards, tom lane


Re: proposal: tuplestore, tuplesort aggregate functions

From
Pavel Stehule
Date:
2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel Stehule <pavel.stehule@gmail.com> writes:
>> 2010/8/18 Tom Lane <tgl@sss.pgh.pa.us>:
>>> There would be plenty of scope to re-use the machinery without any
>>> SQL-level extensions.  All you need is a polymorphic aggregate
>>> transition function that maintains a tuplestore or whatever.
>
>> Have we to use a transisdent function? If we implement median as
>> special variant of aggregate - because we need to push an sort, then
>> we can skip a transident function function - and call directly final
>> function.
>
> Well, that would require a whole bunch of *other* mechanisms, which you
> weren't saying anything about in your original proposal.  But driving
> it off the transtype declaration would be quite inappropriate anyway IMO.
>

I'll test both variant first. Maybe there are not any significant
difference between them. Now nodeAgg can build, fill a tuplesort. So I
think is natural use it. It needs only one - skip a calling a
transident function and directly call final function with external
tuplesort. Minimally you don't need 2x same code.

Regards

Pavel Stehule

>                        regards, tom lane
>


Re: proposal: tuplestore, tuplesort aggregate functions

From
Pavel Stehule
Date:
Hello

>
> I'll test both variant first. Maybe there are not any significant
> difference between them. Now nodeAgg can build, fill a tuplesort. So I
> think is natural use it. It needs only one - skip a calling a
> transident function and directly call final function with external
> tuplesort. Minimally you don't need 2x same code.

yesterday I did a small test. Aggregates without transident functions
are only about 2% faster, so there has no sense thinking more about
them.  I'll send a patch with median and percentile functions
immediately - these functions are implemented like usual aggregates.

Regards

Pavel

>
> Regards
>
> Pavel Stehule
>
>>                        regards, tom lane
>>
>


Re: proposal: tuplestore, tuplesort aggregate functions

From
David Fetter
Date:
On Thu, Aug 19, 2010 at 12:45:13PM +0200, Pavel Stehule wrote:
> Hello
> 
> > I'll test both variant first. Maybe there are not any significant
> > difference between them. Now nodeAgg can build, fill a tuplesort.
> > So I think is natural use it. It needs only one - skip a calling a
> > transident function and directly call final function with external
> > tuplesort. Minimally you don't need 2x same code.
> 
> yesterday I did a small test. Aggregates without transident
> functions are only about 2% faster, so there has no sense thinking
> more about them.  I'll send a patch with median and percentile
> functions immediately - these functions are implemented like usual
> aggregates.

NTILE is already a windowing function.  Might want to check into any
performance improvements you can give that.

As to median, please make sure you say in detail which median you're
using and name it so, as there is no single, authoritative median.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: proposal: tuplestore, tuplesort aggregate functions

From
Pavel Stehule
Date:
2010/8/19 David Fetter <david@fetter.org>:
> On Thu, Aug 19, 2010 at 12:45:13PM +0200, Pavel Stehule wrote:
>> Hello
>>
>> > I'll test both variant first. Maybe there are not any significant
>> > difference between them. Now nodeAgg can build, fill a tuplesort.
>> > So I think is natural use it. It needs only one - skip a calling a
>> > transident function and directly call final function with external
>> > tuplesort. Minimally you don't need 2x same code.
>>
>> yesterday I did a small test. Aggregates without transident
>> functions are only about 2% faster, so there has no sense thinking
>> more about them.  I'll send a patch with median and percentile
>> functions immediately - these functions are implemented like usual
>> aggregates.
>
> NTILE is already a windowing function.  Might want to check into any
> performance improvements you can give that.

The performance has to be +/- same. It is based on same technology - tuplesort.

>
> As to median, please make sure you say in detail which median you're
> using and name it so, as there is no single, authoritative median.

I searched about this. And I found so there is a different methods for
same statistic value with different names. But result has to be same.
I don't found a other median than "arithmetic" or "financial" median
(with a two derived forms - left, right median). These methods are a
different forms of some SQL - see
http://www.simple-talk.com/sql/t-sql-programming/median-workbench/ and
it is SQL related solutions. With tuplesort I can to use a simple
solution - I have a number of elements, have  a sorted set and can
move to n position is set.
Next forms of median what I found are estimations.

regards

Pavel

>
> Cheers,
> David.
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
> iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>


Re: proposal: tuplestore, tuplesort aggregate functions

From
Robert Haas
Date:
On Thu, Aug 19, 2010 at 9:46 AM, David Fetter <david@fetter.org> wrote:
> As to median, please make sure you say in detail which median you're
> using and name it so, as there is no single, authoritative median.

You've made this assertion at least three times now, but I confess
that I've only ever learned one way to compute a median; and quick
Google searches for "median", "kinds of median", and few other
variants failed to turn up anything obvious either.  It seems to me
that if median is good enough for Oracle, Sybase, Excel, and the nun
who taught my fifth-grade elementary school class, it ought to be good
enough for us, too.  (Don't mess with Sr. Catherine!)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: proposal: tuplestore, tuplesort aggregate functions

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> You've made this assertion at least three times now, but I confess
> that I've only ever learned one way to compute a median; and quick
> Google searches for "median", "kinds of median", and few other
> variants failed to turn up anything obvious either.

There are different ways to define it when the number of samples is even.
However I believe that "use the mean of the two middle values" is much
the most common way to deal with that.
        regards, tom lane


Re: proposal: tuplestore, tuplesort aggregate functions

From
Robert Haas
Date:
On Thu, Aug 19, 2010 at 4:21 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> You've made this assertion at least three times now, but I confess
>> that I've only ever learned one way to compute a median; and quick
>> Google searches for "median", "kinds of median", and few other
>> variants failed to turn up anything obvious either.
>
> There are different ways to define it when the number of samples is even.
> However I believe that "use the mean of the two middle values" is much
> the most common way to deal with that.

I suppose there could also be a bit of an ambiguity if you're working
with a type like int4 where the values are discrete steps.  Like, what
do you do with {1, 2}?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: proposal: tuplestore, tuplesort aggregate functions

From
"Kevin Grittner"
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> I suppose there could also be a bit of an ambiguity if you're
working
> with a type like int4 where the values are discrete steps.  Like,
what
> do you do with {1, 2}?
The same thing you do with the avg function?
-Kevin


Re: proposal: tuplestore, tuplesort aggregate functions

From
Tom Lane
Date:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> Robert Haas <robertmhaas@gmail.com> wrote:
>> I suppose there could also be a bit of an ambiguity if you're working
>> with a type like int4 where the values are discrete steps.  Like, what
>> do you do with {1, 2}?

Hmm, good point.
> The same thing you do with the avg function?

avg's approach is not at all datatype-independent though.  If you're
willing to give up the idea of a polymorphic median() function, that
would probably be the thing to do.  If not, you need to take the left
or right one of the two central elements.
        regards, tom lane


Re: proposal: tuplestore, tuplesort aggregate functions

From
David Fetter
Date:
On Thu, Aug 19, 2010 at 05:40:46PM -0400, Tom Lane wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> > Robert Haas <robertmhaas@gmail.com> wrote:
> >> I suppose there could also be a bit of an ambiguity if you're working
> >> with a type like int4 where the values are discrete steps.  Like, what
> >> do you do with {1, 2}?
> 
> Hmm, good point.
>  
> > The same thing you do with the avg function?
> 
> avg's approach is not at all datatype-independent though.  If you're
> willing to give up the idea of a polymorphic median() function, that
> would probably be the thing to do.  If not, you need to take the left
> or right one of the two central elements.

Whether the median needs to be in the sample is one question that
doesn't really have a universal right answer.  A separate question,
also without a universal right answer, is whether the median needs to
be in the same domain as the sample, int4 being the case above.

We could just go with "whatever Oracle, DB2 and MS-SQL Server have,"
assuming it's the same thing, until something appears in the SQL
standard.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: proposal: tuplestore, tuplesort aggregate functions

From
Robert Haas
Date:
On Thu, Aug 19, 2010 at 5:53 PM, David Fetter <david@fetter.org> wrote:
> On Thu, Aug 19, 2010 at 05:40:46PM -0400, Tom Lane wrote:
>> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> > Robert Haas <robertmhaas@gmail.com> wrote:
>> >> I suppose there could also be a bit of an ambiguity if you're working
>> >> with a type like int4 where the values are discrete steps.  Like, what
>> >> do you do with {1, 2}?
>>
>> Hmm, good point.
>>
>> > The same thing you do with the avg function?
>>
>> avg's approach is not at all datatype-independent though.  If you're
>> willing to give up the idea of a polymorphic median() function, that
>> would probably be the thing to do.  If not, you need to take the left
>> or right one of the two central elements.
>
> Whether the median needs to be in the sample is one question that
> doesn't really have a universal right answer.  A separate question,
> also without a universal right answer, is whether the median needs to
> be in the same domain as the sample, int4 being the case above.
>
> We could just go with "whatever Oracle, DB2 and MS-SQL Server have,"
> assuming it's the same thing, until something appears in the SQL
> standard.

That's usually a sensible default, when in doubt.  If nothing else, it
improves compatibility.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


Re: proposal: tuplestore, tuplesort aggregate functions

From
David Fetter
Date:
On Thu, Aug 19, 2010 at 06:03:57PM -0400, Robert Haas wrote:
> On Thu, Aug 19, 2010 at 5:53 PM, David Fetter <david@fetter.org> wrote:
> > On Thu, Aug 19, 2010 at 05:40:46PM -0400, Tom Lane wrote:
> >> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> >> > Robert Haas <robertmhaas@gmail.com> wrote:
> >> >> I suppose there could also be a bit of an ambiguity if you're working
> >> >> with a type like int4 where the values are discrete steps.  Like, what
> >> >> do you do with {1, 2}?
> >>
> >> Hmm, good point.
> >>
> >> > The same thing you do with the avg function?
> >>
> >> avg's approach is not at all datatype-independent though.  If
> >> you're willing to give up the idea of a polymorphic median()
> >> function, that would probably be the thing to do.  If not, you
> >> need to take the left or right one of the two central elements.
> >
> > Whether the median needs to be in the sample is one question that
> > doesn't really have a universal right answer.  A separate
> > question, also without a universal right answer, is whether the
> > median needs to be in the same domain as the sample, int4 being
> > the case above.
> >
> > We could just go with "whatever Oracle, DB2 and MS-SQL Server
> > have," assuming it's the same thing, until something appears in
> > the SQL standard.
> 
> That's usually a sensible default, when in doubt.  If nothing else,
> it improves compatibility.

That's assuming we find such a thing, which I haven't yet.

Trying to stay cheery,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate