On Mon, Sep 23, 2019 at 12:53:36PM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> I've been working on a custom aggregate, and I've ran into some fairly
>> annoying overhead due to casting direct parameters over and over. I'm
>> wondering if there's a way to eliminate this, somehow, without having to
>> do an explicit cast.
>
>> Imagine you have a simple aggregate:
>
>> CREATE AGGREGATE tdigest_percentile(double precision, int, double precision[])
>> (
>> ...
>> );
>
>> with two direct parameters (actually, I'm not sure that's the correct
>> term, becuse this is not an ordered-set aggregate and [1] only talks
>> about direct parameters in that context). Anyway, I'm talking about the
>> extra parameters, after the 'double precision' value to aggregate.
>
>But you're not telling the system that those are direct parameters,
>at least not if you mean that they can only legitimately have one value
>across the whole query. As-is, they're just more aggregated arguments
>so we have to evaluate them again at each row.
>
Understood.
>It's fairly messy that the SQL spec ties direct arguments to ordered-set
>aggregates; you'd think there'd be some value in treating those features
>as orthogonal. I'm not sure how we could wedge them into the syntax
>otherwise, though :-(. You could perhaps convert your aggregate to
>an ordered-set aggregate, but then you'd be paying for a sort that
>you don't need, IIUC.
>
Yeah, having to do the sort (and keep all the data) is exactly what the
tdigest is meant to eliminate, so making it an ordered-set aggregate is
exactly the thing I don't want to do. Also, it disables parallel query,
which is another reason not to do that.
>> After a while, I've realized that the issue is casting - the CTE
>> produces numeric[] array, and we do the cast to double precision[] on
>> every call to the state transition function (and we do ~10M of those).
>
>The only reason that the CTE reference is cheap is that we understand
>that it's stable so we don't have to recompute it each time; otherwise
>you'd be moaning about that more than the cast. As you say, the short
>term workaround is to do the casting inside the sub-select. I think the
>long term fix is to generically avoid re-computing stable subexpressions.
>There was a patch for that a year or so ago but the author never finished
>it, AFAIR.
>
Hmmm, yeah. I'll dig throgh the archives, although it's not a very high
priority - it's more a thing that surprised/bugged me while working on
the custom aggregate.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services