Re: Combining Aggregates - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Combining Aggregates
Date
Msg-id CA+TgmoaSWZqoNiy3F21DBxoHjF1y1pUpx8QQTVWBpfo-Pnf4+w@mail.gmail.com
Whole thread Raw
In response to Re: Combining Aggregates  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Combining Aggregates  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: Combining Aggregates  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Mon, Jan 18, 2016 at 6:32 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> In my tests, this seems to be slightly slower than what we're doing
>> today; worst of all, it adds a handful of cycles to
>> advance_transition_function() even when the aggregate is not an
>> expanded object or, indeed, not even pass-by-reference.  Some of this
>> might be able to be fixed by a little massaging - in particular,
>> DatumIsReadWriteExpandedObject() seems like it could be partly or
>> entirely inlined, and maybe there's some other way to improve the
>> coding here.
>
> It also seems that an expanded object requires a new memory context which
> must be malloc()d and free()d. This has added quite an overhead in my
> testing. I assume that we must be doing that so that we can ensure that all
> memory is properly free()d once we're done with the expanded-object.

Yeah.  The API contract for an expanded object took me quite a while
to puzzle out, but it seems to be this: if somebody hands you an R/W
pointer to an expanded object, you're entitled to assume that you can
"take over" that object and mutate it however you like.  But the
object might be in some other memory context, so you have to move it
into your own memory context.  That's implementing by reparenting the
object's context under your context.  This is nice because it's O(1) -
whereas copying would be O(n) - but it's sort of aggravating, too.  In
this case, the transition function already knows that it wants
everything in the per-agg state, but it can't just create everything
there and be done with it, because nodeAgg.c has to content with the
possibility that some other piece of code will return an expanded
object that *isn't* parented to the aggcontext.  And in fact one of
the regression tests does exactly that, which caused me lots of
head-banging yesterday.

Making it worse, the transition function isn't required to have the
same behavior every time: the one in the regression tests sometimes
returns an expanded-object pointer, and sometimes doesn't, and if
nodeAgg.c reparents that pointer to the aggcontext, the next call to
the transition function reparents the pointer BACK to the per-tuple
context, so you can't even optimize away the repeated set-parent
calls.  Uggh.

> create table ab(a int, b text);
> insert into ab select x,'aaaaaaaaaaaaaaaaaaaaaaaaaaa' from
> generate_Series(1,1000000) x(x);
> set work_mem='1GB';
> vacuum analyze ab;
>
> explain analyze select a%1000000,length(string_agg(b,',')) from ab group by
> 1;
>
> Patched
> 1521.045 ms
> 1515.905 ms
> 1530.920 ms
>
> Master
> 932.457 ms
> 959.285 ms
> 991.021 ms
>
> Although performance of the patched version is closer to master when we have
> less groups, I felt it necessary to show the extreme case. I feel this puts
> a bit of a dampener on the future of this idea, as I've previously had
> patches rejected for adding 2-5% on planning time alone, adding over 50%
> execution time seems like a dead-end.

Thanks for working up this test case.  I certainly agree that adding
50% execution time is a dead-end, but I wonder if that problem can be
fixed somehow.  It would be a shame to find out that expanded-objects
can't be effectively used for anything other than the purposes for
which they were designed, namely speeding up PL/pgsql array
operations.

> seems neater. I just don't want to waste my time writing all the code
> required to replace all INTERNAL aggregate states when I know the
> performance regression is going to be unacceptable.

No argument there.  If the performance regression isn't fixable, this
approach is doomed.

> I also witnessed another regression with your patch when testing on another
> machine. It caused the plan to change to a HashAgg instead of GroupAgg
> causing a significant slowdown.
>
> Unpatched
>
> # explain analyze select a%1000000,length(string_agg(b,',')) from ab group
> by 1;
>                                                         QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------
>  GroupAggregate  (cost=119510.84..144510.84 rows=1000000 width=32) (actual
> time=538.938..1015.278 rows=1000000 loops=1)
>    Group Key: ((a % 1000000))
>    ->  Sort  (cost=119510.84..122010.84 rows=1000000 width=32) (actual
> time=538.917..594.194 rows=1000000 loops=1)
>          Sort Key: ((a % 1000000))
>          Sort Method: quicksort  Memory: 102702kB
>          ->  Seq Scan on ab  (cost=0.00..19853.00 rows=1000000 width=32)
> (actual time=0.016..138.964 rows=1000000 loops=1)
>  Planning time: 0.146 ms
>  Execution time: 1047.511 ms
>
>
> Patched
> # explain analyze select a%1000000,length(string_agg(b,',')) from ab group
> by 1;
>                                                        QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------
>  HashAggregate  (cost=24853.00..39853.00 rows=1000000 width=32) (actual
> time=8072.346..144424.872 rows=1000000 loops=1)
>    Group Key: (a % 1000000)
>    ->  Seq Scan on ab  (cost=0.00..19853.00 rows=1000000 width=32) (actual
> time=0.025..481.332 rows=1000000 loops=1)
>  Planning time: 0.164 ms
>  Execution time: 263288.332 ms

Well, that's pretty odd.  I guess the plan change must be a result of
switching the transition type from internal to text, although I'm not
immediately certain why that would make a difference.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [PATCH] better systemd integration
Next
From: Alvaro Herrera
Date:
Subject: Re: RFC: replace pg_stat_activity.waiting with something more descriptive