Re: Parallel Aggregates for string_agg and array_agg - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: Parallel Aggregates for string_agg and array_agg
Date
Msg-id CAA8=A7-OrZDOcRwZjqNv8nNEQzE8JSbjTsAjv2CK2zyq4jPe+A@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Aggregates for string_agg and array_agg  (Magnus Hagander <magnus@hagander.net>)
List pgsql-hackers
On Tue, Mar 27, 2018 at 5:36 PM, Magnus Hagander <magnus@hagander.net> wrote:
> On Tue, Mar 27, 2018 at 12:28 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> David Rowley <david.rowley@2ndquadrant.com> writes:
>> > On 27 March 2018 at 09:27, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> >> I do not think it is accidental that these aggregates are exactly the
>> >> ones
>> >> that do not have parallelism support today.  Rather, that's because you
>> >> just about always have an interest in the order in which the inputs get
>> >> aggregated, which is something that parallel aggregation cannot
>> >> support.
>>
>> > This very much reminds me of something that exists in the 8.4 release
>> > notes:
>> >> SELECT DISTINCT and UNION/INTERSECT/EXCEPT no longer always produce
>> >> sorted output (Tom)
>>
>> That's a completely false analogy: we got a significant performance
>> benefit for a significant fraction of users by supporting hashed
>> aggregation.  My argument here is that only a very tiny fraction of
>> string_agg/array_agg users will not care about aggregation order, and thus
>> I don't believe that this patch can help very many people.  Against that,
>> it's likely to hurt other people, by breaking their queries and forcing
>> them to insert expensive explicit sorts to fix it.  Even discounting the
>> backwards-compatibility question, we don't normally adopt performance
>> features for which it's unclear that the net gain over all users is
>> positive.
>
>
> I think you are quite wrong in claiming that only a tiny fraction of the
> users are going to care.
>
> This may, and quite probably does, hold true for string_agg(), but not for
> array_agg(). I see a lot of cases where people use that to load it into an
> unordered array/hashmap/set/whatever on the client side, which looses
> ordering *anyway*,and they would definitely benefit from it.

Agreed, I have seen lots of uses of array_agg where the order didn't matter.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Parallel Aggregates for string_agg and array_agg
Next
From: "David G. Johnston"
Date:
Subject: Re: PQHost() undefined behavior if connecting string contains bothhost and hostaddr types