Re: Thoughts about NUM_BUFFER_PARTITIONS - Mailing list pgsql-hackers

From Andrey M. Borodin
Subject Re: Thoughts about NUM_BUFFER_PARTITIONS
Date
Msg-id 1B6B9FE6-8B88-4043-A1B0-824B8EEF6785@yandex-team.ru
Whole thread Raw
In response to Re: Thoughts about NUM_BUFFER_PARTITIONS  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
One of our customers recently asked me to look into buffer mapping.
Following is my POV on the problem of optimal NUM_BUFFER_PARTITIONS.

I’ve found some dead code: BufMappingPartitionLockByIndex() is unused, and unused for a long time. See patch 1.

> On 23 Feb 2024, at 22:25, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> Well, if Postgres Pro implements this, I don't know what their reasoning
> was exactly, but I guess they wanted to make it easier to experiment
> with different values (without rebuild), or maybe they actually have
> systems where they know higher values help ...
>
> Note: I'd point the maximum value 8 translates to 256, so no - it does
> not max at the same value as PostgreSQL.

I’ve prototyped similar GUC for anyone willing to do such experiments. See patch 2, 4. Probably, I’ll do some
experimentstoo, on customer's clusters and workloads :) 

> Anyway, this value is inherently a trade off. If it wasn't, we'd set it
> to something super high from the start. But having more partitions of
> the lock table has a cost too, because some parts need to acquire all
> the partition locks (and that's O(N) where N = number of partitions).

I’ve found none such cases, actually. Or, perhaps, I was not looking good enough.
pg_buffercache iterates over buffers and releases locks. See patch 3 to fix comments.

> Of course, not having enough lock table partitions has a cost too,
> because it increases the chance of conflict between backends (who happen
> to need to operate on the same partition). This constant is not
> constant, it changes over time - with 16 cores the collisions might have
> been rare, with 128 not so much. Also, with partitioning we may need
> many more locks per query.
>
> This means it's entirely possible it'd be good to have more than 128
> partitions of the lock table, but we only change this kind of stuff if
> we have 2 things:
>
> 1) clear demonstration of the benefits (e.g. a workload showing an
> improvement with higher number of partitions)
>
> 2) analysis of how this affects other workloads (e.g. cases that may
> need to lock all the partitions etc)
>
> Ultimately it's a trade off - we need to judge if the impact in (2) is
> worth the improvement in (1).
>
> None of this was done in this thread. There's no demonstration of the
> benefits, no analysis of the impact etc.
>
> As for turning the parameter into a GUC, that has a cost too. Either
> direct - a compiler can do far more optimizations with compile-time
> constants than with values that may change during execution, for
> example.

I think overhead of finding partition by hash is negligible small.
num_partitions in HTAB controls number of freelists. This might have some effect.

> Or indirect - if we can't give users any guidance how/when to
> tune the GUC, it can easily lead to misconfiguration (I can't even count
> how many times I had to deal with systems where the values were "tuned"
> following the logic that more is always better).

Yes, this argument IMHO is most important. By doing more such knobs we promote superstitious approach to tuning.


Best regards, Andrey Borodin.

Attachment

pgsql-hackers by date:

Previous
From: Ilya Gladyshev
Date:
Subject: Re: optimizing pg_upgrade's once-in-each-database steps
Next
From: Erik Wienhold
Date:
Subject: Re: psql: Add leakproof field to \dAo+ meta-command results