Re: Parallel Aggregates for string_agg and array_agg - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Parallel Aggregates for string_agg and array_agg
Date
Msg-id 452e8b7f-418b-4e61-c212-c0e4410b94ad@2ndquadrant.com
Whole thread Raw
In response to Re: Parallel Aggregates for string_agg and array_agg  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Parallel Aggregates for string_agg and array_agg  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers

On 03/27/2018 04:51 AM, David Rowley wrote:
> On 27 March 2018 at 13:45, David Rowley <david.rowley@2ndquadrant.com> wrote:
>> On 27 March 2018 at 12:49, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Oh, I thought of another thing that would need to get done, if we decide
>>> to commit this.  array_agg_serialize/deserialize only work if the array
>>> element type has send/receive functions.  The planner's logic to decide
>>> whether partial aggregation is possible doesn't look any deeper than
>>> whether the aggregate has serialize/deserialize, but I think we have to
>>> teach it that if it's these particular serialize/deserialize functions,
>>> it needs to check the array element type.  Otherwise we'll get runtime
>>> failures if partial aggregation is attempted on array_agg().
>>>
>>> This would not matter for any core datatypes, since they all have
>>> send/receive functions, but I imagine it's still pretty common for
>>> extension types not to.
>>>
>>> For my money it'd be sufficient to hard-code the test like
>>>
>>>     if ((aggserialfn == F_ARRAY_AGG_SERIALIZE ||
>>>          aggdeserialfn == F_ARRAY_AGG_DESERIALIZE) &&
>>>         !array_element_type_has_send_and_recv(exprType(aggregate input)))
>>
>> I'd failed to consider this.
>>
>> Would it not be cleaner to just reject trans types without a
>> send/receive function? Or do you think that would exclude other valid
>> cases?
> 
> Seems I didn't mean "trans types". I should have said aggregate
> function argument types.
> 
> The attached I adds this check.
> 

I'm not sure that's better than the check proposed by Tom. An argument
type without send/receive function does not necessarily mean we can't
serialize/deserialize the trans value. Because who says the argument
value will be embedded in the trans value?

For example, I might create an aggregate to build a bloom filter,
accepting anyelement. That will hash the argument value, and obviously
has no issues with serializing/deserializing the trans value.

This check would effectively disable parallelism for all aggregates with
anyarray, anyelement, anynonarray, anyenum (and a few more) arguments.
That for example includes jsonb_agg, which some people already mentioned
as another candidate for enabling parallelism. Not great, I guess.

Tom's solution is much more focused - it recognizes that array_agg is a
fairly rare case where the (actual) argument type gets stored in the
trans type, and detects that.


PS: The function is misnamed - you're talking about argument types, yet
the function name refers to transtypes.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: PQHost() undefined behavior if connecting string contains bothhost and hostaddr types
Next
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility