Home > mailing lists

Re: AllocSetEstimateChunkSpace() - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: AllocSetEstimateChunkSpace()
Date	March 26, 2020 01:09:36
Msg-id	20200325220936.il3ni2fj2j2b45y5@alap3.anarazel.de Whole thread Raw
In response to	Re: AllocSetEstimateChunkSpace() (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: AllocSetEstimateChunkSpace() (Jeff Davis <pgsql@j-davis.com>)
List	pgsql-hackers

Tree view

Hi,

On 2020-03-25 14:43:43 -0700, Jeff Davis wrote:
> On Wed, 2020-03-25 at 12:42 -0700, Andres Freund wrote:
> > As mention on im/call: I think this is mainly an argument for
> > combining
> > the group tuple & per-group state allocations. I'm kind of fine with
> > afterwards taking the allocator overhead into account.
> 
> The overhead comes from two places: (a) the chunk header; and (b) the
> round-up-to-nearest-power-of-two behavior.
> 
> Combining the group tuple and per-group states only saves the overhead
> from (a); it does nothing to help (b), which is often bigger.

Hm? It very well can help with b), since the round-up only happens once
now? I guess you could argue that it's possible that afterwards we'd
more likely to end in a bigger size class, and thus have roughly the
same amount of waste due rounding? But I don't think that's all that
convincing.

I still, as I mentioned on the call, suspect that the right thing here
is to use an allocation strategy that suffers from neither a nor b (for
tuple and pergroup) and that has faster allocations too. That then also
would have the consequence that we don't need to care about per-alloc
overhead anymore (be it a or b).

> And it only saves that overhead when there *is* a per-group state
> (i.e. not for a DISTINCT query).

So?

> > I still don't buy that its useful to estimate the by-ref transition
> > value overhead. We just don't have anything even have close enough to
> > a
> > meaningful value to base this on. 
> 
> By-ref transition values aren't a primary motivation for me. I'm fine
> leaving that discussion separate if that's a sticking point. But if we
> do have a way to measure chunk overhead, I don't really see a reason
> not to use it for by-ref as well.

Well, my point is that it's pretty much pointless for by-ref types. The
size estimates, if they exist, are so inaccurate that we don't gain
anything by including it. As I said before, I think we'd be better off
initially assuming a higher transition space estimate.

> > Yea, the "context needs to exist" part sucks. I really don't want to
> > add
> > calls directly into AllocSet from more places though. And just
> > ignoring
> > the parameters to the context seems wrong too.
> 
> So do you generally favor the EstimateMemoryChunkSpace() patch (that
> works for all context types)? Or do you have another suggestion? Or do
> you think we should just do nothing?

I think I'm increasingly leaning towards either using a constant
overhead factor, or just getting rid of all memory context
overhead. There's clearly no obviously correct design for the "chunk
size" functions, and not having overhead is better than ~correctly
estimating it.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Tom Lane
Date: 26 March 2020, 00:51:50
Subject: Re: plan cache overhead on plpgsql expression

From: Andres Freund
Date: 26 March 2020, 01:49:03
Subject: Re: plan cache overhead on plpgsql expression

Re: AllocSetEstimateChunkSpace() - Mailing list pgsql-hackers

Previous

Next