Re: Memory-Bounded Hash Aggregation - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Memory-Bounded Hash Aggregation
Date
Msg-id 1b5c79a14f85e20899b100381c1e8d41e3148c5b.camel@j-davis.com
Whole thread Raw
In response to Re: Memory-Bounded Hash Aggregation  (Adam Lee <ali@pivotal.io>)
Responses Re: Memory-Bounded Hash Aggregation  (Adam Lee <ali@pivotal.io>)
List pgsql-hackers
On Wed, 2019-12-04 at 19:50 -0800, Adam Lee wrote:
> On Wed, Dec 04, 2019 at 06:55:43PM -0800, Jeff Davis wrote:
> > 
> > Thanks very much for a great review! I've attached a new patch.
> 
> Hi,
> 
> About the `TODO: project needed attributes only` in your patch, when
> would the input tuple contain columns not needed? It seems like
> anything
> you can project has to be in the group or aggregates.

If you have a table like:

   CREATE TABLE foo(i int, j int, x int, y int, z int);

And do:

   SELECT i, SUM(j) FROM foo GROUP BY i;

At least from a logical standpoint, you might expect that we project
only the attributes we need from foo before feeding them into the
HashAgg. But that's not quite how postgres works. Instead, it leaves
the tuples intact (which, in this case, means they have 5 attributes)
until after aggregation and lazily fetches whatever attributes are
referenced. Tuples are spilled from the input, at which time they still
have 5 attributes; so naively copying them is wasteful.

I'm not sure how often this laziness is really a win in practice,
especially after the expression evaluation has changed so much in
recent releases. So it might be better to just project all the
attributes eagerly, and then none of this would be a problem. If we
still wanted to be lazy about attribute fetching, that should still be
possible even if we did a kind of "logical" projection of the tuple so
that the useless attributes would not be relevant. Regardless, that's
outside the scope of the patch I'm currently working on.

What I'd like to do is copy just the attributes needed into a new
virtual slot, leave the unneeded ones NULL, and then write it out to
the tuplestore as a MinimalTuple. I just need to be sure to get the
right attributes.

Regards,
    Jeff Davis





pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] Block level parallel vacuum
Next
From: Jeff Davis
Date:
Subject: Re: Memory-Bounded Hash Aggregation