Re: Window-functions patch handling of aggregates - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Window-functions patch handling of aggregates
Date
Msg-id 14175.1230319049@sss.pgh.pa.us
Whole thread Raw
In response to Re: Window-functions patch handling of aggregates  (Greg Stark <greg.stark@enterprisedb.com>)
Responses Re: Window-functions patch handling of aggregates
Re: Window-functions patch handling of aggregates
Re: Window-functions patch handling of aggregates
List pgsql-hackers
Greg Stark <greg.stark@enterprisedb.com> writes:
> Yeah, it seems like adding a flag like iswindowable to aggregate  
> functions is the safest option.

I agree with Hitoshi-san: that's passing information in the wrong
direction.  The right direction is to make it visible to the called
function which context it's being called in, so that it can do the
right thing (or at worst, throw an error if it can't).

The current state of play is that the documented expectation for
aggregate functions is that they should do
if (fcinfo->context && IsA(fcinfo->context, AggState))    ... optimized code for aggregate case ...else    ... support
regularcall, or throw error ...
 

However, a bit of grepping for AggState shows that this expectation
isn't met very well within the core distribution, let alone elsewhere.
There are about 10 aggregates that have optimizations like this, only
8 of which are playing by the rules --- the violators are:

tsearch2's tsa_rewrite_accum: just assumes it's been passed an AggState,
dumps core (or worse) if not
array_agg: Asserts it's been passed an AggState, dumps core if not

As submitted, Hitoshi's patch made the WindowAgg code pass its
WindowAggState to the aggregate functions, which at best would have the
result of disabling the internal optimizations of the aggregates, and
would result in a core dump in any aggregate that was taking a shortcut.
The working copy I have right now does this:
   if (numaggs > 0)   {       /*        * Set up a mostly-dummy AggState to be passed to plain aggregates        * so
theyknow they can optimize things.  We don't supply any useful        * info except for the ID of the
aggregate-lifetimecontext.        */       winstate->aggstate = makeNode(AggState);
winstate->aggstate->aggcontext= winstate->wincontext;   }
 

and then passes the dummy AggState to plain aggregates, instead of
WindowAggState.  This makes count(*) and most of the other aggs work
as desired, but it's not sufficient for array_agg because of that
release-the-data-structure-in-the-final-function optimization.

So the alternatives I see are:

1. Go back to Hitoshi's plan of passing WindowAggState to the
aggregates.  This will require changing every one of the ten aggregates
in the core distro, as well as every third-party aggregate that has
a similar optimization; and we just have to keep our fingers crossed
that anyone who's taking a short-cut will fix their code before it
fails in the field.

2. Use an intermediate dummy AggState as I have in my version, but
document some convention for telling this from a "real" AggState
when needed.  (Not hard, we just pick some field that would never be
zero in a real AggState and document testing that.)  This is certainly
on the ugly side, but it would very substantially cut the number of
places that need changes.  Only aggregates that are doing something
irreversible in their final-functions would need to be touched.

If we were working in a green field then #1 would clearly be the
preferable choice, but worrying about compatibility with existing
third-party aggregates is making me lean to #2.  Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Kurt Roeckx
Date:
Subject: Gcc 4.4 causes abort in plpython.
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Window-functions patch handling of aggregates