Re: overhead due to casting extra parameters with aggregates (overand over) - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: overhead due to casting extra parameters with aggregates (overand over)
Date
Msg-id 20190923174447.lncaq7g5l244azia@development
Whole thread Raw
In response to Re: overhead due to casting extra parameters with aggregates (over and over)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Sep 23, 2019 at 12:53:36PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> I've been working on a custom aggregate, and I've ran into some fairly
>> annoying overhead due to casting direct parameters over and over. I'm
>> wondering if there's a way to eliminate this, somehow, without having to
>> do an explicit cast.
>
>> Imagine you have a simple aggregate:
>
>>   CREATE AGGREGATE tdigest_percentile(double precision, int, double precision[])
>>   (
>>     ...
>>   );
>
>> with two direct parameters (actually, I'm not sure that's the correct
>> term, becuse this is not an ordered-set aggregate and [1] only talks
>> about direct parameters in that context). Anyway, I'm talking about the
>> extra parameters, after the 'double precision' value to aggregate.
>
>But you're not telling the system that those are direct parameters,
>at least not if you mean that they can only legitimately have one value
>across the whole query.  As-is, they're just more aggregated arguments
>so we have to evaluate them again at each row.
>

Understood.

>It's fairly messy that the SQL spec ties direct arguments to ordered-set
>aggregates; you'd think there'd be some value in treating those features
>as orthogonal.  I'm not sure how we could wedge them into the syntax
>otherwise, though :-(.  You could perhaps convert your aggregate to
>an ordered-set aggregate, but then you'd be paying for a sort that
>you don't need, IIUC.
>

Yeah, having to do the sort (and keep all the data) is exactly what the
tdigest is meant to eliminate, so making it an ordered-set aggregate is
exactly the thing I don't want to do. Also, it disables parallel query,
which is another reason not to do that.

>> After a while, I've realized that the issue is casting - the CTE
>> produces numeric[] array, and we do the cast to double precision[] on
>> every call to the state transition function (and we do ~10M of those).
>
>The only reason that the CTE reference is cheap is that we understand
>that it's stable so we don't have to recompute it each time; otherwise
>you'd be moaning about that more than the cast.  As you say, the short
>term workaround is to do the casting inside the sub-select.  I think the
>long term fix is to generically avoid re-computing stable subexpressions.
>There was a patch for that a year or so ago but the author never finished
>it, AFAIR.
>

Hmmm, yeah. I'll dig throgh the archives, although it's not a very high
priority - it's more a thing that surprised/bugged me while working on
the custom aggregate.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: overhead due to casting extra parameters with aggregates (over and over)
Next
From: Alexander Korotkov
Date:
Subject: Re: JSONPATH documentation