Re: Memory-Bounded Hash Aggregation - Mailing list pgsql-hackers

From Adam Lee
Subject Re: Memory-Bounded Hash Aggregation
Date
Msg-id 20191210213422.GD1345@mars.local
Whole thread Raw
In response to Re: Memory-Bounded Hash Aggregation  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Memory-Bounded Hash Aggregation
List pgsql-hackers
On Wed, Dec 04, 2019 at 10:57:51PM -0800, Jeff Davis wrote:
> > About the `TODO: project needed attributes only` in your patch, when
> > would the input tuple contain columns not needed? It seems like
> > anything
> > you can project has to be in the group or aggregates.
> 
> If you have a table like:
> 
>    CREATE TABLE foo(i int, j int, x int, y int, z int);
> 
> And do:
> 
>    SELECT i, SUM(j) FROM foo GROUP BY i;
> 
> At least from a logical standpoint, you might expect that we project
> only the attributes we need from foo before feeding them into the
> HashAgg. But that's not quite how postgres works. Instead, it leaves
> the tuples intact (which, in this case, means they have 5 attributes)
> until after aggregation and lazily fetches whatever attributes are
> referenced. Tuples are spilled from the input, at which time they still
> have 5 attributes; so naively copying them is wasteful.
> 
> I'm not sure how often this laziness is really a win in practice,
> especially after the expression evaluation has changed so much in
> recent releases. So it might be better to just project all the
> attributes eagerly, and then none of this would be a problem. If we
> still wanted to be lazy about attribute fetching, that should still be
> possible even if we did a kind of "logical" projection of the tuple so
> that the useless attributes would not be relevant. Regardless, that's
> outside the scope of the patch I'm currently working on.
> 
> What I'd like to do is copy just the attributes needed into a new
> virtual slot, leave the unneeded ones NULL, and then write it out to
> the tuplestore as a MinimalTuple. I just need to be sure to get the
> right attributes.
> 
> Regards,
>     Jeff Davis

Melanie and I tried this, had a installcheck passed patch. The way how
we verify it is composing a wide table with long unnecessary text
columns, then check the size it writes on every iteration.

Please check out the attachment, it's based on your 1204 version.

-- 
Adam Lee

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: log bind parameter values on error
Next
From: Jens-Wolfhard Schicke-Uffmann
Date:
Subject: Re: Contention on LWLock buffer_content, due to SHARED lock(?)