Thread: Confusing documentation of ordered-set aggregates?

Confusing documentation of ordered-set aggregates?

From
Florian Pflug
Date:
Hi

After reading through the relevant parts of sytnax.sgml, create_aggregate.smgl
and xaggr.sgml, I think I understand how these work - they work exactly like
regular aggregates, except that some arguments are evaluated only once and
passed to the final function instead of the transition function. The whole
"ORDER BY" thing is just crazy syntax the standard mandates - a saner
alternative would have been
ordered_set_agg(direct1,...,directN, WITHIN(arg1,...,argM))

or something like that, right?

So whether "ORDER BY" implies any actual ordering is up to the ordered-set
aggregate's final function. Or at least that's what xaggr.sgml seems to say
Unlike the case for normal aggregates, the sorting of input rows for anordered-set aggregate is <emphasis>not</> done
behindthe scenes, but isthe responsibility of the aggregate's support functions. 

but that seems to contradict syntax.sgml which says
The expressions in the <replaceable>order_by_clause</replaceable> areevaluated once per input row just like normal
aggregatearguments, sortedas per the <replaceable>order_by_clause</replaceable>'s requirements, andfed to the aggregate
functionas input arguments. 

Also, xaggr.sgml has the following to explain why the NULLs are passed for all
aggregated arguments to the final function, instead of simply not passing them
at all
While the null values seem useless at first sight, they are important becausethey make it possible to include the data
typesof the aggregated input(s) inthe final function's signature, which may be necessary to resolve the outputtype of a
polymorphicaggregate. 

Why do ordered-set aggregates required that, when plain aggregates are fine
without it? array_agg(), for example, also has a result type that is
determined by the argument type, yet it's final function doesn't take an
argument of type anyelement, even though it returns anyarray.

best regards,
Florian Pflug




Re: Confusing documentation of ordered-set aggregates?

From
Tom Lane
Date:
Florian Pflug <fgp@phlo.org> writes:
> After reading through the relevant parts of sytnax.sgml, create_aggregate.smgl
> and xaggr.sgml, I think I understand how these work - they work exactly like
> regular aggregates, except that some arguments are evaluated only once and
> passed to the final function instead of the transition function.

Yeah, that statement is correct.

> The whole
> "ORDER BY" thing is just crazy syntax the standard mandates - a saner
> alternative would have been
>  ordered_set_agg(direct1,...,directN, WITHIN(arg1,...,argM))
> or something like that, right?

Not sure.  The syntax is certainly something out of far left field (which
is pretty much par for the course with the SQL committee :-().  But the
concept basically is "to the extent that your results depend on an assumed
ordering of the input rows, this is what to use".  That seems sane enough,
at least for aggregates where the input ordering does matter.

> So whether "ORDER BY" implies any actual ordering is up to the ordered-set
> aggregate's final function.

Yes, the committed patch intentionally doesn't force the aggregate to do
any ordering, though all the built-in aggregates do so.

> but that seems to contradict syntax.sgml which says

>  The expressions in the <replaceable>order_by_clause</replaceable> are
>  evaluated once per input row just like normal aggregate arguments, sorted
>  as per the <replaceable>order_by_clause</replaceable>'s requirements, and
>  fed to the aggregate function as input arguments.

Well, syntax.sgml is just trying to explain the users-eye view.  I'm not
sure that it'd be helpful to say here that the implementation might choose
not to do a physical sort.

> Also, xaggr.sgml has the following to explain why the NULLs are passed for all
> aggregated arguments to the final function, instead of simply not passing them
> at all

>  While the null values seem useless at first sight, they are important because
>  they make it possible to include the data types of the aggregated input(s) in
>  the final function's signature, which may be necessary to resolve the output
>  type of a polymorphic aggregate.

> Why do ordered-set aggregates required that, when plain aggregates are fine
> without it?

Actually, if polymorphic types had existed when the original aggregate
infrastructure was designed, it might well have been done like that.
I was thinking while working on the ordered-set patch that this would
be a really nifty thing for regular polymorphic aggregates too.  Right
now, the only safe way to make a polymorphic plain aggregate is to use a
polymorphic state type, and that type has to be sufficient to determine
the result type.  If you'd like to define the state type as "internal",
you lose --- there's no connection between the input and result types.

So I was wondering if we shouldn't think about how to allow regular
aggregates to use final functions defined in this style.  But it's
not something I've got time to pursue at the moment.

> array_agg(), for example, also has a result type that is
> determined by the argument type, yet it's final function doesn't take an
> argument of type anyelement, even though it returns anyarray.

Yeah.  So it's a complete leap of faith on the type system's part that
this function is an appropriate final function for array_agg().  I'm
not sure offhand if CREATE AGGREGATE would even allow this combination
to be created, or if it only works because we manually jammed those rows
into the catalogs at initdb time.  But it would certainly be safer if
CREATE AGGREGATE *didn't* allow it.
        regards, tom lane



Re: Confusing documentation of ordered-set aggregates?

From
Tom Lane
Date:
I wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> array_agg(), for example, also has a result type that is
>> determined by the argument type, yet it's final function doesn't take an
>> argument of type anyelement, even though it returns anyarray.

> Yeah.  So it's a complete leap of faith on the type system's part that
> this function is an appropriate final function for array_agg().  I'm
> not sure offhand if CREATE AGGREGATE would even allow this combination
> to be created, or if it only works because we manually jammed those rows
> into the catalogs at initdb time.  But it would certainly be safer if
> CREATE AGGREGATE *didn't* allow it.

Actually, after a little bit of experimentation, the irreproducible
manual catalog hack is the very existence of array_agg_finalfn().
If you try to reproduce it via CREATE FUNCTION, the system will object:

regression=# create function foo(internal) returns anyarray as
regression-# 'array_agg_finalfn' language internal;
ERROR:  cannot determine result data type
DETAIL:  A function returning a polymorphic type must have at least one polymorphic argument.

So what the ordered-set-aggregate patch has done is introduce a principled
way to define polymorphic aggregates with non-polymorphic state types,
something we didn't have before.
        regards, tom lane