Re: Parallel Aggregates for string_agg and array_agg - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Parallel Aggregates for string_agg and array_agg
Date
Msg-id 18594.1522099194@sss.pgh.pa.us
Whole thread Raw
In response to Re: Parallel Aggregates for string_agg and array_agg  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Parallel Aggregates for string_agg and array_agg  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 03/26/2018 10:27 PM, Tom Lane wrote:
>> I fear that what will happen, if we commit this, is that something like
>> 0.01% of the users of array_agg and string_agg will be pleased, another
>> maybe 20% will be unaffected because they wrote ORDER BY which prevents
>> parallel aggregation, and the remaining 80% will scream because we broke
>> their queries.  Telling them they should've written ORDER BY isn't going
>> to cut it, IMO, when the benefit of that breakage will accrue only to some
>> very tiny fraction of use-cases.

> Isn't the ordering unreliable *already*?

Not if the query is such that what gets chosen is, say, an indexscan or
mergejoin.  It might be theoretically unreliable and yet work fine for
a given application.

I might be too pessimistic about the fraction of users who are depending
on ordered input without having written anything that explicitly forces
that ... but I stand by the theory that it substantially exceeds the
fraction of users who could get any benefit.

Your own example of assuming that separate aggregates are computed
in the same order reinforces my point, I think.  In principle, anybody
who's doing that should write

       array_agg(e order by x),
       array_agg(f order by x),
       string_agg(g order by x)

because otherwise they shouldn't assume that; the manual certainly doesn't
promise it.  But nobody does that in production, because if they did
they'd get killed by the fact that the sorts are all done independently.
(We should improve that someday, but it hasn't been done yet.)  So I think
there are an awful lot of people out there who are assuming more than a
lawyerly reading of the manual would allow.  Their reaction to this will
be about like ours every time the GCC guys decide that some longstanding
behavior of C code isn't actually promised by the text of the C standard.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Hernandez
Date:
Subject: Re: Proposal: http2 wire format
Next
From: Stephen Frost
Date:
Subject: Re: Parallel Aggregates for string_agg and array_agg