Re: Parallel Aggregates for string_agg and array_agg - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Parallel Aggregates for string_agg and array_agg
Date
Msg-id 20180501212127.6rqw4wj6osxtjyvx@alap3.anarazel.de
Whole thread Raw
In response to Re: Parallel Aggregates for string_agg and array_agg  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 2018-05-01 17:16:16 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2018-05-01 14:09:39 -0700, Mark Dilger wrote:
> >> I don't care which order the data is in, as long as x[i] and y[i] are
> >> matched correctly.  It sounds like this patch would force me to write
> >> that as, for example:
> >> 
> >> select array_agg(a order by a, b) AS x, array_agg(b order by a, b) AS y
> >> from generate_a_b_func(foo);
> >> 
> >> which I did not need to do before.
> 
> > Why would it require that? Rows are still processed row-by-row even if
> > there's parallelism, no?
> 
> Yeah, as long as we distribute all the aggregates in the same way,
> it seems like they'd all see the same random-ish input ordering.
> I can vaguely conceive of future optimizations that might break
> that, but not what we have today.

Yea, a column store would with a and b being in different column sets,
or a and b originating from different tables and processing the two
aggregates in independent parts of the query tree, or other similar
stuff could result in trouble for the above assumption. But that seems
pretty unrelated to the matter at hand...

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Parallel Aggregates for string_agg and array_agg
Next
From: Miles Elam
Date:
Subject: Re: Format base - Code contribution