Home > mailing lists

Re: Parallel Aggregates for string_agg and array_agg - Mailing list pgsql-hackers

From	Magnus Hagander
Subject	Re: Parallel Aggregates for string_agg and array_agg
Date	March 27, 2018 10:06:59
Msg-id	CABUevEzy6=AT3meQf7q4jfP65nWjEGjPT_8D9n+BxanEgVozWw@mail.gmail.com Whole thread
In response to	Re: Parallel Aggregates for string_agg and array_agg (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Parallel Aggregates for string_agg and array_agg
List	pgsql-hackers

Tree view

On Tue, Mar 27, 2018 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

David Rowley <david.rowley@2ndquadrant.com> writes:
> On 27 March 2018 at 09:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I do not think it is accidental that these aggregates are exactly the ones
>> that do not have parallelism support today. Rather, that's because you
>> just about always have an interest in the order in which the inputs get
>> aggregated, which is something that parallel aggregation cannot support.

> This very much reminds me of something that exists in the 8.4 release notes:
>> SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer always produce sorted output (Tom)

That's a completely false analogy: we got a significant performance
benefit for a significant fraction of users by supporting hashed
aggregation. My argument here is that only a very tiny fraction of
string_agg/array_agg users will not care about aggregation order, and thus
I don't believe that this patch can help very many people. Against that,
it's likely to hurt other people, by breaking their queries and forcing
them to insert expensive explicit sorts to fix it. Even discounting the
backwards-compatibility question, we don't normally adopt performance
features for which it's unclear that the net gain over all users is
positive.

I think you are quite wrong in claiming that only a tiny fraction of the users are going to care.

This may, and quite probably does, hold true for string_agg(), but not for array_agg(). I see a lot of cases where people use that to load it into an unordered array/hashmap/set/whatever on the client side, which looses ordering *anyway*,and they would definitely benefit from it.

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

pgsql-hackers by date:

From: Magnus Hagander
Date: 27 March 2018, 09:56:10
Subject: Re: Online enabling of checksums

From: Andrew Dunstan
Date: 27 March 2018, 10:22:39
Subject: Re: Parallel Aggregates for string_agg and array_agg

Re: Parallel Aggregates for string_agg and array_agg - Mailing list pgsql-hackers

Previous

Next