Re: AllocSetEstimateChunkSpace() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AllocSetEstimateChunkSpace()
Date
Msg-id 20200325194235.3fciqzxvbyg7q2c3@alap3.anarazel.de
Whole thread Raw
In response to Re: AllocSetEstimateChunkSpace()  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: AllocSetEstimateChunkSpace()  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Hi,

On 2020-03-24 18:12:03 -0700, Jeff Davis wrote:
> Attached is a small patch that introduces a simple function,
> AllocSetEstimateChunkSpace(), and uses it to improve the estimate for
> the size of a hash entry for hash aggregation.
> 
> Getting reasonable estimates for the hash entry size (including the
> TupleHashEntryData, the group key tuple, the per-group state, and by-
> ref transition values) is important for multiple reasons:
> 
> * It helps the planner know when hash aggregation is likely to spill,
> and how to cost it.
> 
> * It helps hash aggregation regulate itself by setting a group limit
> (separate from the memory limit) to account for growing by-ref
> transition values.
> 
> * It helps choose a good initial size for the hash table. Too large of
> a hash table will crowd out space that could be used for the group keys
> or the per-group state.
> 
> Each group requires up to three palloc chunks: one for the group key
> tuple, one for the per-group states, and one for a by-ref transition
> value. Each chunk can have a lot of overhead: in addition to the chunk
> header (16 bytes overhead), it also rounds the value up to a power of
> two (~25% overhead). So, it's important to account for this chunk
> overhead.

As mention on im/call: I think this is mainly an argument for combining
the group tuple & per-group state allocations. I'm kind of fine with
afterwards taking the allocator overhead into account.


I still don't buy that its useful to estimate the by-ref transition
value overhead. We just don't have anything even have close enough to a
meaningful value to base this on. Even if we want to consider the
initial transition value or something, we'd be better off initially
over-estimating the size of the transition state by a lot more than 25%
(I am thinking more like 4x or so, with a minumum of 128 bytes or
so). Since this is about the initial size of the hashtable, we're better
off with a too small table, imo. A "too large" table is more likely to
end up needing to spill when filled to only a small degree.


I am kind of wondering if there's actually much point in trying to be
accurate here at all. Especially when doing this from the
planner. Since, for a large fraction of aggregates, we're going to very
roughly guess at transition space anyway, it seems like a bunch of
"overhead factors" could turn out to be better than trying to be
accurate on some parts, while still widely guessing at transition space.
But I'm not sure.


> I considered making it a method of a memory context, but the planner
> will call this function before the hash agg memory context is created.
> It seems to make more sense for this to just be an AllocSet-specific
> function for now.

-1 to this approach. I think it's architecturally the wrong direction to
add more direct calls to functions within specific contexts.



On 2020-03-25 11:46:31 -0700, Jeff Davis wrote:
> On Tue, 2020-03-24 at 18:12 -0700, Jeff Davis wrote:
> > I considered making it a method of a memory context, but the planner
> > will call this function before the hash agg memory context is
> > created.
> 
> I implemented this approach also; attached.
> 
> It works a little better by having access to the memory context, so it
> knows set->allocChunkLimit. It also allows it  to work with the slab
> allocator, which needs the memory context to return a useful number.
> However, this introduces more code and awkwardly needs to use
> CurrentMemoryContext when called from the planner (because the actual
> memory context we're try to estimate for doesn't exist yet).

Yea, the "context needs to exist" part sucks. I really don't want to add
calls directly into AllocSet from more places though. And just ignoring
the parameters to the context seems wrong too.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: Allow continuations in "pg_hba.conf" files
Next
From: Tom Lane
Date:
Subject: Re: plan cache overhead on plpgsql expression