Thread: Aggregate ORDER BY patch

Aggregate ORDER BY patch

From
Andrew Gierth
Date:
Herewith a patch to implement agg(foo ORDER BY bar) with or without
DISTINCT, etc. No artificial restrictions are imposed on what
syntactical combinations are allowed. However, ORDER BY is not allowed
with aggregates used as window functions (as per the existing
restriction on DISTINCT).

Included are regression tests but not docs yet, those will follow
shortly in a separate patch (to save having to keep redoing the code
patch like last time).

Caveat: as discussed earlier, this patch changes the behaviour of
array_agg(DISTINCT x) when applied to NULL inputs. Formerly, the NULLs
were unconditionally skipped; now, they are treated just like DISTINCT
or GROUP BY normally do.

The previous restriction of agg(DISTINCT ...) to single-argument
aggregates is removed. However, there is still a separate code path
for aggregates that use DISTINCT or ORDER BY with only one input
column, for performance reasons.

If a non-default ordering operator is used in combination with
DISTINCT, then the notion of "equality" used for the DISTINCT
comparisons is the one that belongs to the ordering, rather than the
default.

--
Andrew.


Attachment

Re: Aggregate ORDER BY patch

From
Heikki Linnakangas
Date:
Andrew Gierth wrote:
> Herewith a patch to implement agg(foo ORDER BY bar) with or without
> DISTINCT, etc.

What does that mean? Aggregate functions are supposed to be commutative,
right?

> No artificial restrictions are imposed on what
> syntactical combinations are allowed. However, ORDER BY is not allowed
> with aggregates used as window functions (as per the existing
> restriction on DISTINCT).

How is this different from window functions?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


Re: Aggregate ORDER BY patch

From
Greg Stark
Date:
On Fri, Nov 13, 2009 at 7:54 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Andrew Gierth wrote:
>> Herewith a patch to implement agg(foo ORDER BY bar) with or without
>> DISTINCT, etc.
>
> What does that mean? Aggregate functions are supposed to be commutative,
> right?

We certainly have non-commutative agggregates currently, notably array_agg()

I suspect it would be good to have a flag marking the commutative ones


-- 
greg


Re: Aggregate ORDER BY patch

From
Peter Eisentraut
Date:
On fre, 2009-11-13 at 03:16 +0000, Andrew Gierth wrote:
> Caveat: as discussed earlier, this patch changes the behaviour of
> array_agg(DISTINCT x) when applied to NULL inputs. Formerly, the NULLs
> were unconditionally skipped; now, they are treated just like DISTINCT
> or GROUP BY normally do.

The right answer to that should be in the SQL standard.




Re: Aggregate ORDER BY patch

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On fre, 2009-11-13 at 03:16 +0000, Andrew Gierth wrote:
>> Caveat: as discussed earlier, this patch changes the behaviour of
>> array_agg(DISTINCT x) when applied to NULL inputs. Formerly, the NULLs
>> were unconditionally skipped; now, they are treated just like DISTINCT
>> or GROUP BY normally do.

> The right answer to that should be in the SQL standard.

It's not.  The standard defines the behavior of certain specific
aggregates; it doesn't provide general rules that would apply to
user-defined aggregates.  In particular, all the standard aggregates
are strict and so they just ignore nulls anyhow.  The proposed change
would only affect the behavior of aggregates with non-strict transition
functions.
        regards, tom lane


Re: Aggregate ORDER BY patch

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> On Fri, Nov 13, 2009 at 7:54 AM, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Andrew Gierth wrote:
>>> Herewith a patch to implement agg(foo ORDER BY bar) with or without
>>> DISTINCT, etc.
>> 
>> What does that mean? Aggregate functions are supposed to be commutative,
>> right?

> We certainly have non-commutative agggregates currently, notably array_agg()

Right.  The fact that none of the standard aggregates are
order-sensitive doesn't mean that it's not useful to have user-defined
ones that are.  Currently we suggest fetching from an ordered sub-select
if you want to use an aggregate that is input order sensitive.  This
patch just provides an alternative (and equally nonstandard) notation
for that.

I'm not entirely convinced that adding ORDER BY here is a good idea,
partly because it goes so far beyond the spec and partly because it's
not going to be easily optimizable.  But I can see that there is a
use-case.
        regards, tom lane


Re: Aggregate ORDER BY patch

From
Peter Eisentraut
Date:
On fre, 2009-11-13 at 10:01 -0500, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On fre, 2009-11-13 at 03:16 +0000, Andrew Gierth wrote:
> >> Caveat: as discussed earlier, this patch changes the behaviour of
> >> array_agg(DISTINCT x) when applied to NULL inputs. Formerly, the NULLs
> >> were unconditionally skipped; now, they are treated just like DISTINCT
> >> or GROUP BY normally do.
> 
> > The right answer to that should be in the SQL standard.
> 
> It's not.  The standard defines the behavior of certain specific
> aggregates; it doesn't provide general rules that would apply to
> user-defined aggregates.

But array_agg is in the standard.



Re: Aggregate ORDER BY patch

From
Peter Eisentraut
Date:
On fre, 2009-11-13 at 10:35 -0500, Tom Lane wrote:
> I'm not entirely convinced that adding ORDER BY here is a good idea,
> partly because it goes so far beyond the spec

This is exactly the syntax that is in the spec AFAICT.



Re: Aggregate ORDER BY patch

From
Robert Haas
Date:
On Fri, Nov 13, 2009 at 10:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <gsstark@mit.edu> writes:
>> On Fri, Nov 13, 2009 at 7:54 AM, Heikki Linnakangas
>> <heikki.linnakangas@enterprisedb.com> wrote:
>>> Andrew Gierth wrote:
>>>> Herewith a patch to implement agg(foo ORDER BY bar) with or without
>>>> DISTINCT, etc.
>>>
>>> What does that mean? Aggregate functions are supposed to be commutative,
>>> right?
>
>> We certainly have non-commutative agggregates currently, notably array_agg()
>
> Right.  The fact that none of the standard aggregates are
> order-sensitive doesn't mean that it's not useful to have user-defined
> ones that are.  Currently we suggest fetching from an ordered sub-select
> if you want to use an aggregate that is input order sensitive.  This
> patch just provides an alternative (and equally nonstandard) notation
> for that.
>
> I'm not entirely convinced that adding ORDER BY here is a good idea,
> partly because it goes so far beyond the spec and partly because it's
> not going to be easily optimizable.  But I can see that there is a
> use-case.

Yeah, for sure.  I currently handle this, when necessary, by using
subselects, but it would sure be nice to have a more compact notation,
if there's a good way to do that.

...Robert


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
>>>>> "Heikki" == Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Herewith a patch to implement agg(foo ORDER BY bar) with or>> without DISTINCT, etc.
Heikki> What does that mean? Aggregate functions are supposed to beHeikki> commutative, right?

The SQL spec defines two non-commutative aggregates that we implement:

array_agg(x ORDER BY ...)
xmlagg(x ORDER BY ...)

In addition, of course, we allow user-defined aggregates, which are
perfectly free to be non-commutative.

-- 
Andrew (irc:RhodiumToad)


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
>>>>> "Peter" == Peter Eisentraut <peter_e@gmx.net> writes:
>> I'm not entirely convinced that adding ORDER BY here is a good idea,>> partly because it goes so far beyond the
spec
Peter> This is exactly the syntax that is in the spec AFAICT.

Right. The spec defines this syntax for array_agg and xmlagg (only).
The patch goes beyond the spec in that it allows ORDER BY for any
aggregate at all, and also allows combination of ORDER BY and DISTINCT
(the spec only allows DISTINCT with <general set operation> which
doesn't include array_agg or xmlagg, so there is nowhere that both are
allowed).

But it would be entirely unreasonable, the way postgres works, to
implement ORDER BY for only specific aggregates.

(Note also that combining ORDER BY and DISTINCT can change the
behaviour of DISTINCT. e.g.  foo(distinct x order by x using <<<) will
use whatever definition of equality is implied by the hypothetical <<<
operator when comparing values of x for distinctness.)

As for the null handling, the spec is no help there for this reason:
it allows DISTINCT only for <general set operation>, and for all
<general set operation>s whether DISTINCT or not, NULLs are always
discarded from the input before processing (i.e. the behaviour we
implement for strict aggregates). So even the fact that we allow
array_agg(distinct x) at all is going beyond the spec and therefore
doesn't have a defined result.

-- 
Andrew (irc:RhodiumToad)


Re: Aggregate ORDER BY patch

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Peter" == Peter Eisentraut <peter_e@gmx.net> writes:
>  Peter> This is exactly the syntax that is in the spec AFAICT.

> Right. The spec defines this syntax for array_agg and xmlagg (only).

Cool, I had forgotten that they added that in the latest revisions.
I withdraw the complaint that this patch goes too far beyond the spec.

> But it would be entirely unreasonable, the way postgres works, to
> implement ORDER BY for only specific aggregates.

Quite.  This is another instance of the thing I complained of before,
that the SQL committee likes to define the behavior of specific
aggregates instead of inducing a generic aggregate-behavior definition.
So we're on our own to extract one, and this proposal seems pretty
reasonable to me: it's useful and it's consistent with the query-level
behavior of DISTINCT and ORDER BY.
        regards, tom lane


Re: Aggregate ORDER BY patch

From
Andres Freund
Date:
On Friday 13 November 2009 16:35:08 Tom Lane wrote:
> Greg Stark <gsstark@mit.edu> writes:
> > On Fri, Nov 13, 2009 at 7:54 AM, Heikki Linnakangas
> >
> > <heikki.linnakangas@enterprisedb.com> wrote:
> >> Andrew Gierth wrote:
> >>> Herewith a patch to implement agg(foo ORDER BY bar) with or without
> >>> DISTINCT, etc.
> >>
> >> What does that mean? Aggregate functions are supposed to be commutative,
> >> right?
> >
> > We certainly have non-commutative agggregates currently, notably
> > array_agg()
> 
> Right.  The fact that none of the standard aggregates are
> order-sensitive doesn't mean that it's not useful to have user-defined
> ones that are.  Currently we suggest fetching from an ordered sub-select
> if you want to use an aggregate that is input order sensitive.  This
> patch just provides an alternative (and equally nonstandard) notation
> for that.
> 
> I'm not entirely convinced that adding ORDER BY here is a good idea,
> partly because it goes so far beyond the spec and partly because it's
> not going to be easily optimizable.  But I can see that there is a
> use-case.
The spec supports the ORDER BY syntax for the xmlagg aggregate...

Andres


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
>>>>> "Heikki" == Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> No artificial restrictions are imposed on what syntactical>> combinations are allowed. However, ORDER BY is not
allowedwith>> aggregates used as window functions (as per the existing>> restriction on DISTINCT).
 
Heikki> How is this different from window functions?

Window functions return a row for each row of input, aggregates don't.

The reason I didn't tackle the case of aggregate functions used as
window functions is that the spec allows constructs like this:

array_agg(a order by b) over (order by c)

which can't be represented using the aggregate-as-window-function
mechanism as it currently stands, since you'd have to re-sort the
window each time.

-- 
Andrew (irc:RhodiumToad)


Re: Aggregate ORDER BY patch

From
Hitoshi Harada
Date:
2009/11/14 Andrew Gierth <andrew@tao11.riddles.org.uk>:
>>>>>> "Heikki" == Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>
>  >> No artificial restrictions are imposed on what syntactical
>  >> combinations are allowed. However, ORDER BY is not allowed with
>  >> aggregates used as window functions (as per the existing
>  >> restriction on DISTINCT).
>
>  Heikki> How is this different from window functions?
>
> Window functions return a row for each row of input, aggregates don't.
>
> The reason I didn't tackle the case of aggregate functions used as
> window functions is that the spec allows constructs like this:
>
> array_agg(a order by b) over (order by c)
>
> which can't be represented using the aggregate-as-window-function
> mechanism as it currently stands, since you'd have to re-sort the
> window each time.
>
Now I'm about to send my patch to introduce more frame types,
aggregate cache mechanism in window functions may be broken sometimes,
and it is *possible* to put order-by clause in argument list if we
prepare tuplesort as in nodeAgg. But I don't see useful cases and it
seems so hard task that I'm not sold.


Regards,



--
Hitoshi Harada


Re: Aggregate ORDER BY patch

From
Hitoshi Harada
Date:
2009/11/14 Tom Lane <tgl@sss.pgh.pa.us>:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>> "Peter" == Peter Eisentraut <peter_e@gmx.net> writes:
>>  Peter> This is exactly the syntax that is in the spec AFAICT.
>
>> Right. The spec defines this syntax for array_agg and xmlagg (only).
>
> Cool, I had forgotten that they added that in the latest revisions.
> I withdraw the complaint that this patch goes too far beyond the spec.
>
>> But it would be entirely unreasonable, the way postgres works, to
>> implement ORDER BY for only specific aggregates.
>
> Quite.  This is another instance of the thing I complained of before,
> that the SQL committee likes to define the behavior of specific
> aggregates instead of inducing a generic aggregate-behavior definition.
> So we're on our own to extract one, and this proposal seems pretty
> reasonable to me: it's useful and it's consistent with the query-level
> behavior of DISTINCT and ORDER BY.
It's not only in aggregates but also window function as well as plain
functions like substring(x from t). In window functions, IGNORE NULLS
is defined in spec for those first_vlaue(), last_value(), lead(),
lag(), etc. but not for generic use. I'm +1 for an approach to apply
them for generic cases.

Regards,

--
Hitoshi Harada


Re: Aggregate ORDER BY patch

From
Greg Stark
Date:
On Fri, Nov 13, 2009 at 5:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Quite.  This is another instance of the thing I complained of before,
> that the SQL committee likes to define the behavior of specific
> aggregates instead of inducing a generic aggregate-behavior definition.

I think this makes sense from the point of view of the spec authors.
They're trying to standardize the behaviour of the functions that
their existing implementations provide without creating extra demands
on themselves to implement new features. Even if some of them do
implement more general solutions the path of least resistance to
getting their syntax standardized will be the one which imposes the
least cost on the other members of the committee.

> So we're on our own to extract one, and this proposal seems pretty
> reasonable to me: it's useful and it's consistent with the query-level
> behavior of DISTINCT and ORDER BY.

++


--
greg


Re: Aggregate ORDER BY patch

From
Hitoshi Harada
Date:
Here's my first review.

The patch was in context diff format and applied cleanly with a little
3 hunks in parse_expr.c. make succeeded without any warnings and make
check passed all 121 tests.

It implements as advertised, following SQL spec with extension of both
DISTINCT clause and ORDER BY clause are available in any aggregate
functions including user defined ones. It supports VIEWs by adding
code in ruleutils.c.

Questions here:
- agglevelsup?
We have aggregate capability that all arguments from upper level query
in downer level aggregate makes aggregate call itself to upper level
call, as a constant value in downer level. What if ORDER BY clause has
downer level Vars? For example:

regression=# select (select count(t1.four order by unique1) from tenk1
limit 1) from tenk1 t1 where unique1 < 10;?column?
----------   10000   10000   10000   10000   10000   10000   10000   10000   10000   10000
(10 rows)

regression=# select (select count(t1.four order by t1.unique1) from
tenk1 limit 1) from tenk1 t1 where unique1 < 10;?column?
----------      10
(1 row)

Is it sane? The result is consistent but surprised me a little. No
need to raise an error?

- order by 1?
Normal ORDER BY clause accepts constant integer as TargetEntry's
resno. The patch seems not to support it.

regression=# select array_agg(four order by 1) from tenk1 where unique1 < 10;      array_agg
-----------------------{0,2,1,2,1,0,1,3,3,0}
(1 row)

Shouldn't it be the same as normal ORDER BY?

Performance doesn't seem slowing down, though I don't have
quantitative test result.

Coding, almost all sane. Since its syntax and semantics are similar to
existing DISTINCT and ORDER BY features, parsing and transformation
code are derived from those area. The executor has few issues:

- #include in nodeAgg.c
executor/tuptable.h is added in the patch but required really?
I removed that line but still build without any warnings.

- process_ordered_aggregate_(single|multi)
It seems that the patch left process_sorted_aggregate() function as
process_ordered_aggregate_single() and added
process_ordered_aggregate_multi() for more than one input arguments
(actually target entries) case. Why have those two? Could we combine
them? Or I couldn't find convincing reason in comments.

And ParseFuncOrColumn() in parse_func.c now gets more complicated.
Since feature / semantics are similar, I bet we may share some code to
transform DISTINCT and ORDER BY with traditional code in
parse_clause.c, though I'm not sure nor don't have clear idea.
Especially, code around here
        save_next_resno = pstate->p_next_resno;        pstate->p_next_resno = attno + 1;

cheats pstate to transform clauses and I felt a bit fear.
- SortGroupClause.implicit
"implicit" member was added in SortGroupClause. I didn't find clear
reason to add this. Could you show a case to clarify this?

That's it for now.

Regards,



-- 
Hitoshi Harada


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
>>>>> "Hitoshi" == Hitoshi Harada <umi.tanuki@gmail.com> writes:
Hitoshi> Questions here:Hitoshi> - agglevelsup?
Hitoshi> We have aggregate capability that all arguments from upperHitoshi> level query in downer level aggregate makes
aggregatecallHitoshi> itself to upper level call, as a constant value in downerHitoshi> level. What if ORDER BY clause
hasdowner level Vars?
 

For determining what query level the aggregate belongs to, the
expressions in the ORDER BY clause are counted along with the actual
argument expressions.
Hitoshi> Is it sane? The result is consistent but surprised me aHitoshi> little. No need to raise an error?

What case exactly would you consider an error? When an order by
expression references a lower (more deeply nested) query level than
any of the actual arguments?
Hitoshi> - order by 1?
Hitoshi> Normal ORDER BY clause accepts constant integer asHitoshi> TargetEntry's resno. The patch seems not to support
it.Hitoshi>Shouldn't it be the same as normal ORDER BY?
 

Specifically documented. The SQL spec doesn't allow ordinal positions
in ORDER BY any more (those are a holdover from SQL92) and we don't
support them in, for example, window ORDER BY clauses.
Hitoshi> Performance doesn't seem slowing down, though I don't haveHitoshi> quantitative test result.

The performance is intended to be no worse than DISTINCT already was,
though it's also no better.
Hitoshi> Coding, almost all sane. Since its syntax and semantics areHitoshi> similar to existing DISTINCT and ORDER BY
features,parsingHitoshi> and transformation code are derived from those area. TheHitoshi> executor has few issues:
 
Hitoshi> - #include in nodeAgg.cHitoshi> executor/tuptable.h is added in the patch but required really?Hitoshi> I
removedthat line but still build without any warnings.
 

The code is making explicit use of various Slot calls declared in
tuptable.h. The only reason why it builds without error when you
remove that is that utils/tuplesort.h happens to include tuptable.h
indirectly.
Hitoshi> - process_ordered_aggregate_(single|multi)Hitoshi> It seems that the patch left
process_sorted_aggregate()Hitoshi>function as process_ordered_aggregate_single() and addedHitoshi>
process_ordered_aggregate_multi()for more than one inputHitoshi> arguments (actually target entries) case. Why have
thoseHitoshi>two? Could we combine them? Or I couldn't find convincingHitoshi> reason in comments.
 

Performance.

tuplesort_getdatum etc. seems to be substantially faster than
tuplesort_gettupleslot especially for the case where you're sorting a
pass-by-value datum such as an integer (since the datum is then stored
only in the sort tuple header and doesn't require a separate space
allocation for itself). Using a slot in all cases would have slowed
down some common cases like count(distinct id) by a measurable amount.

Cases like array_agg(x order by x) benefit from the faster code path
too.

The memory management between the two cases is sufficiently different
that combining them into one function while still maintaining the
slot vs. datum distinction would be ugly and probably error-prone.
The relatively minor duplication of logic seemed much clearer to me.
Hitoshi> And ParseFuncOrColumn() in parse_func.c now gets moreHitoshi> complicated.

I thought very hard about breaking some of that out into a separate
function, but it wasn't initially clear which parts might have needed
access to the original raw parsetree. I'm open to opinions on this.
Hitoshi> Since feature / semantics are similar, I bet we may shareHitoshi> some code to transform DISTINCT and ORDER BY
withHitoshi>traditional code in parse_clause.c, though I'm not sure norHitoshi> don't have clear idea.  Especially,
codearound here
 
Hitoshi>     save_next_resno = pstate->p_next_resno;Hitoshi>     pstate->p_next_resno = attno + 1;
Hitoshi> cheats pstate to transform clauses and I felt a bit fear.

The code that transforms RETURNING clauses does something similar with
p_next_resno.

Almost all the work of transforming the ORDER BY clause is actually
done via the existing transformSortClause (which is the reason why
p_next_resno needs to be saved and restored), the additional logic
is only for the DISTINCT case, to validate the correspondance between
DISTINCT args and ORDER BY args and to generate implicit ordering
clauses (which provide comparison function info to the executor)
when needed.
Hitoshi>  - SortGroupClause.implicitHitoshi> "implicit" member was added in SortGroupClause. I didn'tHitoshi> find
clearreason to add this. Could you show a case toHitoshi> clarify this?
 

Without that flag or something like it, when you do

create view foo as select count(distinct x) from table;

and then display the view definition, you would get back the query as
"select count(distinct x order by x) from table" which would be
confusing and unnecessarily backwards- and forwards-incompatible.

So the code sets "implicit" for any SortGroupClause that is added for
some internal reason rather than being present in the original query,
and the deparse code in ruleutils skips such clauses.

-- 
Andrew.


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
Andrew> Performance.
Andrew> tuplesort_getdatum etc. seems to be substantially faster thanAndrew> tuplesort_gettupleslot especially for the
casewhere you'reAndrew> sorting a pass-by-value datum such as an integer (since theAndrew> datum is then stored only in
thesort tuple header andAndrew> doesn't require a separate space allocation forAndrew> itself). Using a slot in all
caseswould have slowed downAndrew> some common cases like count(distinct id) by a measurableAndrew> amount.
 
Andrew> Cases like array_agg(x order by x) benefit from the fasterAndrew> code path too.
Andrew> The memory management between the two cases is sufficientlyAndrew> different that combining them into one
functionwhile stillAndrew> maintaining the slot vs. datum distinction would be ugly andAndrew> probably error-prone.
Therelatively minor duplication ofAndrew> logic seemed much clearer to me.
 

Just to quantify this, using a production-quality build (optimized and
without assertions), it turns out that the fast code path
(process_ordered_aggregate_single) is faster by 300% for pass-by-value
types, and by approximately 20% for short values of pass-by-reference
types, as compared to disabling that code path and forcing even the
one-arg case to use the slot interface.

So using the slot interface for everything would have constituted a
300% slowdown over the older code for count(distinct id), obviously
undesirable.

As it stands, I can't detect any performance regression over the
previous code.

This means that agg(x order by y) is rather noticably slower than
agg(x order by x), but this is pretty much unavoidable given how the
sorting code works.

Future performance enhancements (which I have no particular plans to
tackle) would involve having the planner consult the desired aggregate
orderings and estimating the cost of sorting as opposed to the cost of
producing a plan with the input already ordered. Also combining the
sort step for aggregates that share a single ordering.

-- 
Andrew (irc:RhodiumToad)


Re: Aggregate ORDER BY patch

From
Greg Stark
Date:
On Sun, Nov 15, 2009 at 11:23 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> Future performance enhancements (which I have no particular plans to
> tackle) would involve having the planner consult the desired aggregate
> orderings and estimating the cost of sorting as opposed to the cost of
> producing a plan with the input already ordered. Also combining the
> sort step for aggregates that share a single ordering.

Those both seem like pretty important things. Do you have an idea how
to go about doing them?


-- 
greg


Re: Aggregate ORDER BY patch

From
Hitoshi Harada
Date:
2009/11/16 Andrew Gierth <andrew@tao11.riddles.org.uk>:
>>>>>> "Hitoshi" == Hitoshi Harada <umi.tanuki@gmail.com> writes:
>
>  Hitoshi> Questions here:
>  Hitoshi> - agglevelsup?
> What case exactly would you consider an error? When an order by
> expression references a lower (more deeply nested) query level than
> any of the actual arguments?

It's only that I felt not intuitive. To me, arguments are regarded as
aggregate's "member" while ORDER BY clause expressions didn't hit me.
If it's only me, no problem. Maybe additional document in
#syntax-aggregates will do.

>  Hitoshi> - order by 1?
>
>  Hitoshi> Normal ORDER BY clause accepts constant integer as
>  Hitoshi> TargetEntry's resno. The patch seems not to support it.
>  Hitoshi> Shouldn't it be the same as normal ORDER BY?
>
> Specifically documented. The SQL spec doesn't allow ordinal positions
> in ORDER BY any more (those are a holdover from SQL92) and we don't
> support them in, for example, window ORDER BY clauses.

Clear.

>  Hitoshi> Coding, almost all sane. Since its syntax and semantics are
>  Hitoshi> similar to existing DISTINCT and ORDER BY features, parsing
>  Hitoshi> and transformation code are derived from those area. The
>  Hitoshi> executor has few issues:
>
>  Hitoshi> - #include in nodeAgg.c
>  Hitoshi> executor/tuptable.h is added in the patch but required really?
>  Hitoshi> I removed that line but still build without any warnings.
>
> The code is making explicit use of various Slot calls declared in
> tuptable.h. The only reason why it builds without error when you
> remove that is that utils/tuplesort.h happens to include tuptable.h
> indirectly.
>
> The code is making explicit use of various Slot calls declared in
> tuptable.h. The only reason why it builds without error when you
> remove that is that utils/tuplesort.h happens to include tuptable.h
> indirectly.

My C skill is not so good to determine if that #include is needed.
Simply we'd not needed it, it seems to me that it isn't needed still.

>  Hitoshi> - process_ordered_aggregate_(single|multi)
>  Hitoshi> It seems that the patch left process_sorted_aggregate()
>  Hitoshi> function as process_ordered_aggregate_single() and added
>  Hitoshi> process_ordered_aggregate_multi() for more than one input
>  Hitoshi> arguments (actually target entries) case. Why have those
>  Hitoshi> two? Could we combine them? Or I couldn't find convincing
>  Hitoshi> reason in comments.
>
> Performance.
>
> tuplesort_getdatum etc. seems to be substantially faster than
> tuplesort_gettupleslot especially for the case where you're sorting a
> pass-by-value datum such as an integer (since the datum is then stored
> only in the sort tuple header and doesn't require a separate space
> allocation for itself). Using a slot in all cases would have slowed
> down some common cases like count(distinct id) by a measurable amount.
>
> Cases like array_agg(x order by x) benefit from the faster code path
> too.
>
> The memory management between the two cases is sufficiently different
> that combining them into one function while still maintaining the
> slot vs. datum distinction would be ugly and probably error-prone.
> The relatively minor duplication of logic seemed much clearer to me.

I see the reason. But if we allow this, shouldn't we apply the same
logic to other sort depend parts? Or maybe refactor tuplesort in the
future?

>
>  Hitoshi>  - SortGroupClause.implicit
>  Hitoshi> "implicit" member was added in SortGroupClause. I didn't
>  Hitoshi> find clear reason to add this. Could you show a case to
>  Hitoshi> clarify this?
>
> Without that flag or something like it, when you do
>
> create view foo as select count(distinct x) from table;
>
> and then display the view definition, you would get back the query as
> "select count(distinct x order by x) from table" which would be
> confusing and unnecessarily backwards- and forwards-incompatible.
>
> So the code sets "implicit" for any SortGroupClause that is added for
> some internal reason rather than being present in the original query,
> and the deparse code in ruleutils skips such clauses.

I don't have much experiences in VIEW systems, but isn't it enough to
let "order by x" omitted? "select count(distinct x order by x) from
table" means the same as "select count(distinct x) from table"
currently. ruleutils can handle it if distinct expressions are equal
to order by expressions.

Regards,


--
Hitoshi Harada


Re: Aggregate ORDER BY patch

From
Tom Lane
Date:
Hitoshi Harada <umi.tanuki@gmail.com> writes:
> 2009/11/16 Andrew Gierth <andrew@tao11.riddles.org.uk>:
> "Hitoshi" == Hitoshi Harada <umi.tanuki@gmail.com> writes:
>>> �Hitoshi> �- SortGroupClause.implicit
>>> �Hitoshi> "implicit" member was added in SortGroupClause. I didn't
>>> �Hitoshi> find clear reason to add this. Could you show a case to
>>> �Hitoshi> clarify this?
>> 
>> So the code sets "implicit" for any SortGroupClause that is added for
>> some internal reason rather than being present in the original query,
>> and the deparse code in ruleutils skips such clauses.

> I don't have much experiences in VIEW systems, but isn't it enough to
> let "order by x" omitted?

I agree with Hitoshi that this seems to be a hack to deal with the
consequences of a bad design decision.  We just recently cleaned up
an ancient mistake of the same sort (having the parser add clauses
to ORDER BY/DISTINCT that the user didn't write) and I don't care
to repeat that error in aggregates.

If it's necessary to decorate the aggregate with additional clauses in
order to make the executor work, the proper place to do that is the
planner.  The reason this division of labor is worth preserving is
that if there are alternative implementations that might reasonably be
chosen, the planner is the place to make that choice.  Nailing down
sort order in the parser is pre-judging something the planner might
wish to change.  (As an example, the reason we had to fix the ORDER
BY/DISTINCT mistake was that it was constraining the planner's choices
about substituting hash aggregation for sort/unique.)
        regards, tom lane


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
>>>>> "Hitoshi" == Hitoshi Harada <umi.tanuki@gmail.com> writes:
>> What case exactly would you consider an error? When an order by>> expression references a lower (more deeply nested)
querylevel>> than any of the actual arguments?
 
Hitoshi> It's only that I felt not intuitive. To me, arguments areHitoshi> regarded as aggregate's "member" while ORDER
BYclauseHitoshi> expressions didn't hit me.  If it's only me, noHitoshi> problem. Maybe additional document in
#syntax-aggregatesHitoshi>will do.
 

How about:
   ... But an exception occurs if the aggregate's arguments   (including any <literal>ORDER BY</> clause) contain only
outer-level variables: the aggregate then belongs to the nearest   such outer level, ...
 
>> Without that flag or something like it, when you do>> >> create view foo as select count(distinct x) from table;>>
>>and then display the view definition, you would get back the query as>> "select count(distinct x order by x) from
table"which would be>> confusing and unnecessarily backwards- and forwards-incompatible.>> >> So the code sets
"implicit"for any SortGroupClause that is added for>> some internal reason rather than being present in the original
query,>>and the deparse code in ruleutils skips such clauses.
 
Hitoshi> I don't have much experiences in VIEW systems, but isn't itHitoshi> enough to let "order by x" omitted?
"selectcount(distinct xHitoshi> order by x) from table" means the same as "selectHitoshi> count(distinct x) from table"
currently.ruleutils canHitoshi> handle it if distinct expressions are equal to order byHitoshi> expressions.
 

That doesn't work in more complex cases. For example, the user might
specify aggfunc(distinct x,y order by x) (not caring about the relative
order of y) but the code will still turn that internally into
aggfunc(distinct x,y order by x,y). It's necessary to be able to recover
what the user originally entered, which means needing to be able to
distinguish both of those cases from aggfunc(distinct x,y).

Other ways this could have been done (but which I rejected) were:
 1) separate lists of SortGroupClauses for "orderings the user    explicitly required" and "ordering we're going to
use"
 2) single list of SortGroupClauses, but a count of how many of the    entries are original, rather than added
 3) deferring the addition of extra ordering elements required for    distinctness until planning time

I didn't consider (3) because I wasn't really touching the planner
for this patch, and the information needed was already available in
parse analysis as part of the error checking.

-- 
Andrew.


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> I agree with Hitoshi that this seems to be a hack to deal with the[snip]

So copying the way that SELECT DISTINCT works would be the way to go?
i.e. keeping separate lists in the parse node for sorting and distinct?

What about error handling? If the user specifies agg(distinct x) where
x is not sortable, do we leave it to the planner to detect that (which
means not reporting the error position?)

-- 
Andrew.


Re: Aggregate ORDER BY patch

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> What about error handling? If the user specifies agg(distinct x) where
> x is not sortable, do we leave it to the planner to detect that (which
> means not reporting the error position?)

Well, at the moment there's only going to be a sort-based
implementation, so I don't object to throwing an error for that
as soon as possible.  OTOH I wouldn't recommend expending a lot
of code to do it there.  I would hope that most of the parser's
work for this can be shared with the existing support for query-level
ORDER BY/DISTINCT.  If that means that we don't complain immediately
about cases where there is hash but not sort support, that seems all
right to me, because there are very few such datatypes anyway.
        regards, tom lane


Re: Aggregate ORDER BY patch

From
Hitoshi Harada
Date:
2009/11/16 Andrew Gierth <andrew@tao11.riddles.org.uk>:
>>>>>> "Hitoshi" == Hitoshi Harada <umi.tanuki@gmail.com> writes:
>
>  >> What case exactly would you consider an error? When an order by
>  >> expression references a lower (more deeply nested) query level
>  >> than any of the actual arguments?
>
>  Hitoshi> It's only that I felt not intuitive. To me, arguments are
>  Hitoshi> regarded as aggregate's "member" while ORDER BY clause
>  Hitoshi> expressions didn't hit me.  If it's only me, no
>  Hitoshi> problem. Maybe additional document in #syntax-aggregates
>  Hitoshi> will do.
>
> How about:
>
>    ... But an exception occurs if the aggregate's arguments
>    (including any <literal>ORDER BY</> clause) contain only
>    outer-level variables: the aggregate then belongs to the nearest
>    such outer level, ...

Reasonable. Thank you.


>  Hitoshi> I don't have much experiences in VIEW systems, but isn't it
>  Hitoshi> enough to let "order by x" omitted? "select count(distinct x
>  Hitoshi> order by x) from table" means the same as "select
>  Hitoshi> count(distinct x) from table" currently. ruleutils can
>  Hitoshi> handle it if distinct expressions are equal to order by
>  Hitoshi> expressions.
>
> That doesn't work in more complex cases. For example, the user might
> specify aggfunc(distinct x,y order by x) (not caring about the relative
> order of y) but the code will still turn that internally into
> aggfunc(distinct x,y order by x,y). It's necessary to be able to recover
> what the user originally entered, which means needing to be able to
> distinguish both of those cases from aggfunc(distinct x,y).
>
With Tom's comment, this issue is closed now with some hope that
author will see if new code can be shared with traditional code once
more.

So I guess all of my review comments are get done. Could you update
your patch with doc patch in it? After that I'll test it again and
will mark this as "Ready for Committer" if no objection nor no problem
found.


Regards,


--
Hitoshi Harada


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> What about error handling? If the user specifies agg(distinct x)>> where x is not sortable, do we leave it to the
plannerto detect>> that (which means not reporting the error position?)
 
Tom> Well, at the moment there's only going to be a sort-basedTom> implementation, so I don't object to throwing an
errorfor thatTom> as soon as possible.  OTOH I wouldn't recommend expending a lotTom> of code to do it there.  I would
hopethat most of the parser'sTom> work for this can be shared with the existing support forTom> query-level ORDER
BY/DISTINCT.

The code already uses transformSortClause for most of the work, but
reusing the existing code for DISTINCT would have required more
refactoring than I was happy with, because transformDistinct etc.
all have error message text which is specific to SELECT DISTINCT etc.
Let's see how it falls out in the next patch.

-- 
Andrew.


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
Updated version of the aggregate order by patch.

Includes docs + regression tests all in the same patch.

Changes:

  - removed SortGroupClause.implicit as per review comments,
    replacing it with separate lists for Aggref.aggorder and
    Aggref.aggdistinct.

  - Refactored in order to move the bulk of the new parse code
    out of ParseFuncOrColumn which was already quite big enough,
    into parse_agg.c

  - fixed a bug with incorrect deparse in ruleutils (and added a
    bunch of regression tests for deparsing and view usage)

  - added some comments

--
Andrew (irc:RhodiumToad)


Attachment

Re: Aggregate ORDER BY patch

From
Hitoshi Harada
Date:
2009/11/30 Andrew Gierth <andrew@tao11.riddles.org.uk>:
> Updated version of the aggregate order by patch.
>
> Includes docs + regression tests all in the same patch.
>
> Changes:
>
>  - removed SortGroupClause.implicit as per review comments,
>    replacing it with separate lists for Aggref.aggorder and
>    Aggref.aggdistinct.
>
>  - Refactored in order to move the bulk of the new parse code
>    out of ParseFuncOrColumn which was already quite big enough,
>    into parse_agg.c
>
>  - fixed a bug with incorrect deparse in ruleutils (and added a
>    bunch of regression tests for deparsing and view usage)
>
>  - added some comments

It seems good to me. Everything that was pointed in the previous
review was fixed, as well as sufficient comments are added. It applies
very cleanly against HEAD and compiles without error/warning.

I found only trivial favors such like that a blank line is added
around line 595 in the patch, and "proj" in peraggstate sounds a
little weird to me because of surrounding "evaldesc" and "evalslot"
("evalproj" seems better to me). Also catversion update doesn't mean
anything for this feature. But these are not what prevent it from
review by a committer. So, although I'm going to look more on this
patch, I mark this item as "Ready for Committer" for now.


Regards,

--
Hitoshi Harada


Re: Aggregate ORDER BY patch

From
Alvaro Herrera
Date:
Hitoshi Harada escribió:

> I found only trivial favors such like that a blank line is added
> around line 595 in the patch, and "proj" in peraggstate sounds a
> little weird to me because of surrounding "evaldesc" and "evalslot"
> ("evalproj" seems better to me). Also catversion update doesn't mean
> anything for this feature. But these are not what prevent it from
> review by a committer. So, although I'm going to look more on this
> patch, I mark this item as "Ready for Committer" for now.

AFAICS the catversion bump is needed because of the change in a parser
node.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: Aggregate ORDER BY patch

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Updated version of the aggregate order by patch.

I'm starting to look at this now.  I find it rather bizarre to merge
both the actual arguments of an aggregate and the optional ORDER BY
expressions into a single targetlist.  It doesn't seem like that would
be an especially convenient representation to work with, and I would
also expect there to be a nonzero performance hit from the extra
TargetEntry expression nodes, even when the feature is not in use.
Why didn't you use separate lists?
        regards, tom lane


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Updated version of the aggregate order by patch.
Tom> I'm starting to look at this now.  I find it rather bizarre toTom> merge both the actual arguments of an aggregate
andthe optionalTom> ORDER BY expressions into a single targetlist.  It doesn't seemTom> like that would be an
especiallyconvenient representation toTom> work with,
 

It's extremely convenient, since you need the arguments and the
ordering expressions together in a slot in order to feed them to
tuplesort (in the general case where there is more than one
expression); you need a tupledesc to pass to tuplesort; and there are
existing functions to construct all of these things from the tlist.
Also, you want to merge the argument expressions and ordering
expressions where possible, and this is exactly what the existing
transformSortClause stuff expects.
Tom> and I would also expect there to be a nonzero performance hitTom> from the extra TargetEntry expression nodes,
evenwhen theTom> feature is not in use.
 

I tested for that and couldn't reliably detect any (certainly <2%
on count(i) on generated data, and under the circumstances I was
testing that's not necessarily outside the error margin).

-- 
Andrew (irc:RhodiumToad)


Re: Aggregate ORDER BY patch

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> and I would also expect there to be a nonzero performance hit
>  Tom> from the extra TargetEntry expression nodes, even when the
>  Tom> feature is not in use.

> I tested for that and couldn't reliably detect any (certainly <2%
> on count(i) on generated data, and under the circumstances I was
> testing that's not necessarily outside the error margin).

Hm.  Now that I look closer I see we already have a hack to avoid
executing an extra expression node when a targetlist item is evaluated,
so the extra cost should indeed be insignificant.  There would be
nonzero associated costs during planning and executor startup, but we
seldom bother optimizing for that.  So nevermind...
        regards, tom lane


Re: Aggregate ORDER BY patch

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Updated version of the aggregate order by patch.

Applied with some editorialization.  The main change I made was to get
rid of all the ad-hoc DISTINCT handling in parse_agg.c and use
transformDistinctClause() instead.  This exposed what I believe to
be a bug in the submitted patch: it accepted cases like
agg(DISTINCT x ORDER BY x,y)

We do not allow that in ordinary query-level DISTINCT because it's
ambiguous --- there might be multiple y values for any particular value
of x, so the ordering is uncertain.  The only way it's not uncertain is
if the sort by x fully determines the order, in which case listing y is
merely useless.  I think this is something that was changed not too
long ago, so maybe you were trying to emulate the old behavior, but
in any case it's better to not have extra code that doesn't behave
quite like the normal case.
        regards, tom lane


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:>> Updated version of the aggregate order by patch.
Tom> Applied with some editorialization.  The main change I made wasTom> to get rid of all the ad-hoc DISTINCT handling
inparse_agg.cTom> and use transformDistinctClause() instead.
 

I'll review that; I avoided that code intentionally because the
semantics of query-level DISTINCT are different enough.
Tom> This exposed what I believe to be a bug in the submitted patch:Tom> it accepted cases like
Tom>     agg(DISTINCT x ORDER BY x,y)

This is not a bug, it was done intentionally (as you might have
guessed from the fact that there was a regression test for it). The
additional ORDER BY column in this case is always safe (since DISTINCT
adopts the equality operator from the sort, it's not possible for
additional sort columns to break the DISTINCT). I allowed the case
since there was therefore no good reason to forbid it.

There is at least one case where this makes a visible difference in
query output: if the aggregate can distinguish values of x which are
considered equal by the sort operator used, then the value of y
affects which value of x is seen. It is probably relatively easy to
generate examples of this using the box type and array_agg.
Tom> We do not allow that in ordinary query-level DISTINCT

Note that ordinary query-level DISTINCT has the reverse semantics; the
DISTINCT operation is (per spec) logically prior to the order by, the
fact that they are planned in the reverse order is an implementation
detail.

Query-level DISTINCT shouldn't allow columns in the order by that
aren't in the select list because those columns _do not exist_ at the
point that ordering logically takes place (even though in the
implementation, they might).

This isn't the case for aggregate order by.

-- 
Andrew (irc:RhodiumToad)


Re: Aggregate ORDER BY patch

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Query-level DISTINCT shouldn't allow columns in the order by that
> aren't in the select list because those columns _do not exist_ at the
> point that ordering logically takes place (even though in the
> implementation, they might).

> This isn't the case for aggregate order by.

I entirely disagree.  Why should the semantics of this combination of
ORDER BY and DISTINCT be different from what they are at the query
top level?  We made other decisions about this feature on the basis
of making the two cases work alike, and I don't think you've made an
adequate argument for making them act differently.
        regards, tom lane


Re: Aggregate ORDER BY patch

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Query-level DISTINCT shouldn't allow columns in the order by that>> aren't in the select list because those columns
_donot exist_ at>> the point that ordering logically takes place (even though in the>> implementation, they might).
 
>> This isn't the case for aggregate order by.
Tom> I entirely disagree.  Why should the semantics of thisTom> combination of ORDER BY and DISTINCT be different from
whattheyTom> are at the query top level?  We made other decisions about thisTom> feature on the basis of making the two
caseswork alike, and ITom> don't think you've made an adequate argument for making them actTom> differently.
 

A case could possibly be made that the behaviour of DISTINCT at top
level is wrong, or at least less useful than need be.

Notice that there are cases where agg(distinct x order by x) is
nondeterministic while agg(distinct x order by x,y) is deterministic.
In my view that alone is a good argument for allowing it.

-- 
Andrew (irc:RhodiumToad)


Re: Aggregate ORDER BY patch

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Notice that there are cases where agg(distinct x order by x) is
> nondeterministic while agg(distinct x order by x,y) is deterministic.

Well, I think what you're really describing is a case where you're using
the wrong sort opclass.  If the aggregate can distinguish two values of
x, and the sort operator can't, use another sort operator that can.

If we really wanted to take the above seriously, my opinion is that
we ought to introduce DISTINCT ON in aggregates.  However, at that
point you lose the argument of standard syntax, so it's not real
clear why you shouldn't just fall back onselect agg(x) from (select distinct on (x) x ... order by x,y)
        regards, tom lane


Re: Aggregate ORDER BY patch

From
Marko Tiikkaja
Date:
On 2009-12-15 23:10 +0200, Tom Lane wrote:
> Andrew Gierth<andrew@tao11.riddles.org.uk>  writes:
>> Notice that there are cases where agg(distinct x order by x) is
>> nondeterministic while agg(distinct x order by x,y) is deterministic.
>
> Well, I think what you're really describing is a case where you're using
> the wrong sort opclass.  If the aggregate can distinguish two values of
> x, and the sort operator can't, use another sort operator that can.
>
> If we really wanted to take the above seriously, my opinion is that
> we ought to introduce DISTINCT ON in aggregates.  However, at that
> point you lose the argument of standard syntax, so it's not real
> clear why you shouldn't just fall back on
>     select agg(x) from (select distinct on (x) x ... order by x,y)

FWIW, in my opinion the idea behind this patch is to not fall back on 
hacks like that.  This patch already goes beyond the standard and having 
this seems like a useful feature in some cases.  Although the DISTINCT 
ON syntax would have a bit more resemblance on the existing syntax, I'd 
still like to see agg(distinct x order by x,y).

Just my $0.02.


Regards,
Marko Tiikkaja


Re: Aggregate ORDER BY patch

From
Tom Lane
Date:
Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi> writes:
> On 2009-12-15 23:10 +0200, Tom Lane wrote:
>> If we really wanted to take the above seriously, my opinion is that
>> we ought to introduce DISTINCT ON in aggregates.

> FWIW, in my opinion the idea behind this patch is to not fall back on 
> hacks like that.  This patch already goes beyond the standard and having 
> this seems like a useful feature in some cases.  Although the DISTINCT 
> ON syntax would have a bit more resemblance on the existing syntax, I'd 
> still like to see agg(distinct x order by x,y).

I remain entirely unconvinced.  If DISTINCT + ORDER BY work differently
inside aggregates than at query level, we're going to forever be
explaining the difference, fielding bug reports, etc.  Even documenting
the difference would be a serious PITA considering how subtle it is
(AFAICS Andrew's submitted doc patch failed to address the point).

I'm not against the idea of introducing DISTINCT ON here, though I think
perhaps we ought to wait for a release or so and see if there's really
any field demand for it.
        regards, tom lane


Re: Aggregate ORDER BY patch

From
Pavel Stehule
Date:
2009/12/19 Marko Tiikkaja <marko.tiikkaja@cs.helsinki.fi>:
> On 2009-12-15 23:10 +0200, Tom Lane wrote:
>>
>> Andrew Gierth<andrew@tao11.riddles.org.uk>  writes:
>>>
>>> Notice that there are cases where agg(distinct x order by x) is
>>> nondeterministic while agg(distinct x order by x,y) is deterministic.
>>
>> Well, I think what you're really describing is a case where you're using
>> the wrong sort opclass.  If the aggregate can distinguish two values of
>> x, and the sort operator can't, use another sort operator that can.
>>
>> If we really wanted to take the above seriously, my opinion is that
>> we ought to introduce DISTINCT ON in aggregates.  However, at that
>> point you lose the argument of standard syntax, so it's not real
>> clear why you shouldn't just fall back on
>>        select agg(x) from (select distinct on (x) x ... order by x,y)
>
> FWIW, in my opinion the idea behind this patch is to not fall back on hacks
> like that.  This patch already goes beyond the standard and having this
> seems like a useful feature in some cases.  Although the DISTINCT ON syntax
> would have a bit more resemblance on the existing syntax, I'd still like to
> see agg(distinct x order by x,y).
>

when we are talking about extensions - did you thing about LIMIT clause?

select agg(distinct x order by x limit 10) ..

Regards
Pavel



> Just my $0.02.
>
>
> Regards,
> Marko Tiikkaja
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>