Re: AllocSetEstimateChunkSpace() - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AllocSetEstimateChunkSpace()
Date
Msg-id 20200325220936.il3ni2fj2j2b45y5@alap3.anarazel.de
Whole thread Raw
In response to Re: AllocSetEstimateChunkSpace()  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: AllocSetEstimateChunkSpace()  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Hi,

On 2020-03-25 14:43:43 -0700, Jeff Davis wrote:
> On Wed, 2020-03-25 at 12:42 -0700, Andres Freund wrote:
> > As mention on im/call: I think this is mainly an argument for
> > combining
> > the group tuple & per-group state allocations. I'm kind of fine with
> > afterwards taking the allocator overhead into account.
> 
> The overhead comes from two places: (a) the chunk header; and (b) the
> round-up-to-nearest-power-of-two behavior.
> 
> Combining the group tuple and per-group states only saves the overhead
> from (a); it does nothing to help (b), which is often bigger.

Hm? It very well can help with b), since the round-up only happens once
now? I guess you could argue that it's possible that afterwards we'd
more likely to end in a bigger size class, and thus have roughly the
same amount of waste due rounding? But I don't think that's all that
convincing.

I still, as I mentioned on the call, suspect that the right thing here
is to use an allocation strategy that suffers from neither a nor b (for
tuple and pergroup) and that has faster allocations too. That then also
would have the consequence that we don't need to care about per-alloc
overhead anymore (be it a or b).


> And it only saves that overhead when there *is* a per-group state
> (i.e. not for a DISTINCT query).

So?


> > I still don't buy that its useful to estimate the by-ref transition
> > value overhead. We just don't have anything even have close enough to
> > a
> > meaningful value to base this on. 
> 
> By-ref transition values aren't a primary motivation for me. I'm fine
> leaving that discussion separate if that's a sticking point. But if we
> do have a way to measure chunk overhead, I don't really see a reason
> not to use it for by-ref as well.

Well, my point is that it's pretty much pointless for by-ref types. The
size estimates, if they exist, are so inaccurate that we don't gain
anything by including it. As I said before, I think we'd be better off
initially assuming a higher transition space estimate.


> > Yea, the "context needs to exist" part sucks. I really don't want to
> > add
> > calls directly into AllocSet from more places though. And just
> > ignoring
> > the parameters to the context seems wrong too.
> 
> So do you generally favor the EstimateMemoryChunkSpace() patch (that
> works for all context types)? Or do you have another suggestion? Or do
> you think we should just do nothing?

I think I'm increasingly leaning towards either using a constant
overhead factor, or just getting rid of all memory context
overhead. There's clearly no obviously correct design for the "chunk
size" functions, and not having overhead is better than ~correctly
estimating it.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: plan cache overhead on plpgsql expression
Next
From: Andres Freund
Date:
Subject: Re: plan cache overhead on plpgsql expression