Re: Parallel Aggregates for string_agg and array_agg - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Parallel Aggregates for string_agg and array_agg
Date
Msg-id 20180501213832.h6dp5zjophkqdz4h@alap3.anarazel.de
Whole thread Raw
In response to Re: Parallel Aggregates for string_agg and array_agg  (Mark Dilger <hornschnorter@gmail.com>)
List pgsql-hackers
On 2018-05-01 14:35:46 -0700, Mark Dilger wrote:
> 
> > On May 1, 2018, at 2:11 PM, Andres Freund <andres@anarazel.de> wrote:
> > 
> > Hi,
> > 
> > On 2018-05-01 14:09:39 -0700, Mark Dilger wrote:
> >> I don't care which order the data is in, as long as x[i] and y[i] are
> >> matched correctly.  It sounds like this patch would force me to write
> >> that as, for example:
> >> 
> >> select array_agg(a order by a, b) AS x, array_agg(b order by a, b) AS y
> >>  from generate_a_b_func(foo);
> >> 
> >> which I did not need to do before.
> > 
> > Why would it require that? Rows are still processed row-by-row even if
> > there's parallelism, no?
> 
> I was responding in part to Tom's upthread statement:
> 
>   Your own example of assuming that separate aggregates are computed
>   in the same order reinforces my point, I think.  In principle, anybody
>   who's doing that should write
> 
>       array_agg(e order by x),
>       array_agg(f order by x),
>       string_agg(g order by x)
> 
>   because otherwise they shouldn't assume that;
> 
> It seems Tom is saying that you can't assume separate aggregates will be
> computed in the same order.  Hence my response.  What am I missing here?

Afaict Tom was just making a theoretical argument, and one that seems
largely independent of the form of parallelism we're discussing here.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: Parallel Aggregates for string_agg and array_agg
Next
From: David Rowley
Date:
Subject: Re: Parallel Aggregates for string_agg and array_agg