Thread: Thoughts about NUM_BUFFER_PARTITIONS

Thoughts about NUM_BUFFER_PARTITIONS

From
wenhui qiu
Date:
HI hackers
    When I read this text in this document there is a paragraph in it(https://www.interdb.jp/pg/pgsql08/03.html)
/*
The BufMappingLock is split into partitions to reduce contention in the buffer table (the default is 128 partitions). Each BufMappingLock partition guards a portion of the corresponding hash bucket slots.
*/,

Physical servers with terabytes of RAM are now commonplace,I'm looking at the comments inside the source code.I'm looking at the comments inside the source code to see if they still match the current hardware capability? The  comment says that in the future there may be a parameter,Iam a  dba now and I try to think of implementing this parameter, but I'm not a professional kernel developer, I still want the community senior developer to implement this parameter

/*
 * It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
 * here, but we need them to figure out offsets within MainLWLockArray, and
 * having this file include lock.h or bufmgr.h would be backwards.
 */

/* Number of partitions of the shared buffer mapping hashtable */
#define NUM_BUFFER_PARTITIONS  128

/*
 * The number of partitions for locking purposes.  This is set to match
 * NUM_BUFFER_PARTITIONS for now, on the basis that whatever's good enough for
 * the buffer pool must be good enough for any other purpose.  This could
 * become a runtime parameter in future.
 */
#define DSHASH_NUM_PARTITIONS_LOG2 7
#define DSHASH_NUM_PARTITIONS (1 << DSHASH_NUM_PARTITIONS_LOG2)

Re: Thoughts about NUM_BUFFER_PARTITIONS

From
Heikki Linnakangas
Date:
On 08/02/2024 12:17, wenhui qiu wrote:
> HI hackers
>      When I read this text in this document there is a paragraph in 
> it(https://www.interdb.jp/pg/pgsql08/03.html 
> <https://www.interdb.jp/pg/pgsql08/03.html>)
> /*
> The BufMappingLock is split into partitions to reduce contention in the 
> buffer table (the default is 128 partitions). Each BufMappingLock 
> partition guards a portion of the corresponding hash bucket slots.
> */,
> 
> Physical servers with terabytes of RAM are now commonplace,I'm looking 
> at the comments inside the source code.I'm looking at the comments 
> inside the source code to see if they still match the current hardware 
> capability?

The optimal number of partitions has more to do with the number of 
concurrent processes using the buffer cache, rather than the size of the 
cache. The number of CPUs in servers has increased too, but not as much 
as RAM.

But yeah, it's a valid question if the settings still make sense with 
modern hardware.

> The  comment says that in the future there may be a 
> parameter,Iam a  dba now and I try to think of implementing this 
> parameter, but I'm not a professional kernel developer, I still want the 
> community senior developer to implement this parameter

The first thing to do would be to benchmark with different 
NUM_BUFFER_PARTITIONS settings, and see if there's benefit in having 
more partitions.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Thoughts about NUM_BUFFER_PARTITIONS

From
wenhui qiu
Date:
Hi Heikki Linnakangas 
    I think the larger shared buffer  higher the probability of multiple backend processes accessing the same bucket slot BufMappingLock simultaneously, (   InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I have free time, I want to do this test. I have seen some tests, but the result report is in Chinese


Best wishes

Heikki Linnakangas <hlinnaka@iki.fi> 于2024年2月8日周四 19:26写道:
On 08/02/2024 12:17, wenhui qiu wrote:
> HI hackers
>      When I read this text in this document there is a paragraph in
> it(https://www.interdb.jp/pg/pgsql08/03.html
> <https://www.interdb.jp/pg/pgsql08/03.html>)
> /*
> The BufMappingLock is split into partitions to reduce contention in the
> buffer table (the default is 128 partitions). Each BufMappingLock
> partition guards a portion of the corresponding hash bucket slots.
> */,
>
> Physical servers with terabytes of RAM are now commonplace,I'm looking
> at the comments inside the source code.I'm looking at the comments
> inside the source code to see if they still match the current hardware
> capability?

The optimal number of partitions has more to do with the number of
concurrent processes using the buffer cache, rather than the size of the
cache. The number of CPUs in servers has increased too, but not as much
as RAM.

But yeah, it's a valid question if the settings still make sense with
modern hardware.

> The  comment says that in the future there may be a
> parameter,Iam a  dba now and I try to think of implementing this
> parameter, but I'm not a professional kernel developer, I still want the
> community senior developer to implement this parameter

The first thing to do would be to benchmark with different
NUM_BUFFER_PARTITIONS settings, and see if there's benefit in having
more partitions.

--
Heikki Linnakangas
Neon (https://neon.tech)

Re: Thoughts about NUM_BUFFER_PARTITIONS

From
Tomas Vondra
Date:
On 2/8/24 14:27, wenhui qiu wrote:
> Hi Heikki Linnakangas
>     I think the larger shared buffer  higher the probability of multiple
> backend processes accessing the same bucket slot BufMappingLock
> simultaneously, (   InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I
> have free time, I want to do this test. I have seen some tests, but the
> result report is in Chinese
> 

I think Heikki is right this is unrelated to the amount of RAM. The
partitions are meant to reduce the number of lock collisions when
multiple processes try to map a buffer concurrently. But the machines
got much larger in this regard too - in 2006 the common CPUs had maybe
2-4 cores, now it's common to have CPUs with ~100 cores, and systems
with multiple of them. OTOH the time spent holing the partition lock
should be pretty low, IIRC we pretty much just pin the buffer before
releasing it, and the backend should do plenty other expensive stuff. So
who knows how many backends end up doing the locking at the same time.

OTOH, with 128 partitions it takes just 14 backends to have 50% chance
of a conflict, so with enough cores ... But how many partitions would be
enough? With 1024 partitions it still takes only 38 backends to get 50%
chance of a collision. Better, but considering we now have hundreds of
cores, not sure if sufficient.

(Obviously, we probably want much lower probability of a collision, I
only used 50% to illustrate the changes).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Thoughts about NUM_BUFFER_PARTITIONS

From
Li Japin
Date:

> On Feb 10, 2024, at 20:15, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> 
> On 2/8/24 14:27, wenhui qiu wrote:
>> Hi Heikki Linnakangas
>>    I think the larger shared buffer  higher the probability of multiple
>> backend processes accessing the same bucket slot BufMappingLock
>> simultaneously, (   InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I
>> have free time, I want to do this test. I have seen some tests, but the
>> result report is in Chinese
>> 
> 
> I think Heikki is right this is unrelated to the amount of RAM. The
> partitions are meant to reduce the number of lock collisions when
> multiple processes try to map a buffer concurrently. But the machines
> got much larger in this regard too - in 2006 the common CPUs had maybe
> 2-4 cores, now it's common to have CPUs with ~100 cores, and systems
> with multiple of them. OTOH the time spent holing the partition lock
> should be pretty low, IIRC we pretty much just pin the buffer before
> releasing it, and the backend should do plenty other expensive stuff. So
> who knows how many backends end up doing the locking at the same time.
> 
> OTOH, with 128 partitions it takes just 14 backends to have 50% chance
> of a conflict, so with enough cores ... But how many partitions would be
> enough? With 1024 partitions it still takes only 38 backends to get 50%
> chance of a collision. Better, but considering we now have hundreds of
> cores, not sure if sufficient.
> 
> (Obviously, we probably want much lower probability of a collision, I
> only used 50% to illustrate the changes).
> 

I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
NUM_BUFFER_PARTITIONS, am I missing someghing?

Re: Thoughts about NUM_BUFFER_PARTITIONS

From
Tomas Vondra
Date:
On 2/18/24 03:30, Li Japin wrote:
> 
> 
>> On Feb 10, 2024, at 20:15, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>>
>> On 2/8/24 14:27, wenhui qiu wrote:
>>> Hi Heikki Linnakangas
>>>    I think the larger shared buffer  higher the probability of multiple
>>> backend processes accessing the same bucket slot BufMappingLock
>>> simultaneously, (   InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I
>>> have free time, I want to do this test. I have seen some tests, but the
>>> result report is in Chinese
>>>
>>
>> I think Heikki is right this is unrelated to the amount of RAM. The
>> partitions are meant to reduce the number of lock collisions when
>> multiple processes try to map a buffer concurrently. But the machines
>> got much larger in this regard too - in 2006 the common CPUs had maybe
>> 2-4 cores, now it's common to have CPUs with ~100 cores, and systems
>> with multiple of them. OTOH the time spent holing the partition lock
>> should be pretty low, IIRC we pretty much just pin the buffer before
>> releasing it, and the backend should do plenty other expensive stuff. So
>> who knows how many backends end up doing the locking at the same time.
>>
>> OTOH, with 128 partitions it takes just 14 backends to have 50% chance
>> of a conflict, so with enough cores ... But how many partitions would be
>> enough? With 1024 partitions it still takes only 38 backends to get 50%
>> chance of a collision. Better, but considering we now have hundreds of
>> cores, not sure if sufficient.
>>
>> (Obviously, we probably want much lower probability of a collision, I
>> only used 50% to illustrate the changes).
>>
> 
> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
> I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
> NUM_BUFFER_PARTITIONS, am I missing someghing?

IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
the partition locks if needed.

There's other places that acquire a bunch of locks, and all of them need
to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
GIST_MAX_SPLIT_PAGES.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Thoughts about NUM_BUFFER_PARTITIONS

From
Japin Li
Date:
On Mon, 19 Feb 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> On 2/18/24 03:30, Li Japin wrote:
>>
>> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
>> I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
>> NUM_BUFFER_PARTITIONS, am I missing someghing?
>
> IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
> higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
> the partition locks if needed.
>

Thanks for the explanation!  Got it.

> There's other places that acquire a bunch of locks, and all of them need
> to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
> GIST_MAX_SPLIT_PAGES.
>
>
> regards



Re: Thoughts about NUM_BUFFER_PARTITIONS

From
wenhui qiu
Date:
Hi Japlin Li
   Thank you for such important information ! Got it 

Japin Li <japinli@hotmail.com> 于2024年2月19日周一 10:26写道:

On Mon, 19 Feb 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> On 2/18/24 03:30, Li Japin wrote:
>>
>> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
>> I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
>> NUM_BUFFER_PARTITIONS, am I missing someghing?
>
> IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
> higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
> the partition locks if needed.
>

Thanks for the explanation!  Got it.

> There's other places that acquire a bunch of locks, and all of them need
> to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
> GIST_MAX_SPLIT_PAGES.
>
>
> regards

Re: Thoughts about NUM_BUFFER_PARTITIONS

From
wenhui qiu
Date:
Hi Heikki Linnakangas
   I saw git log found this commit:https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892 ,I don't seem to see an email discussing this commit. As the commit log tells us, we don't know exactly how large a value is optimal, and I believe it's more flexible to make it as a parameter.Thank you very much tomas.vondra for explaining the relationship, i see that MAX_SIMUL_LWLOCKS was just doubled in this commit, is there a more appropriate ratio between them?



```````````````````````````````````````````````````````````````````````````
commit 3acc10c997f916f6a741d0b4876126b7b08e3892
Author: Robert Haas <rhaas@postgresql.org>
Date:   Thu Oct 2 13:58:50 2014 -0400

    Increase the number of buffer mapping partitions to 128.

    Testing by Amit Kapila, Andres Freund, and myself, with and without
    other patches that also aim to improve scalability, seems to indicate
    that this change is a significant win over the current value and over
    smaller values such as 64.  It's not clear how high we can push this
    value before it starts to have negative side-effects elsewhere, but
    going this far looks OK.

`````````````````````````````````````````````````````````

wenhui qiu <qiuwenhuifx@gmail.com> 于2024年2月20日周二 09:36写道:
Hi Japlin Li
   Thank you for such important information ! Got it 

Japin Li <japinli@hotmail.com> 于2024年2月19日周一 10:26写道:

On Mon, 19 Feb 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> On 2/18/24 03:30, Li Japin wrote:
>>
>> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
>> I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
>> NUM_BUFFER_PARTITIONS, am I missing someghing?
>
> IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
> higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
> the partition locks if needed.
>

Thanks for the explanation!  Got it.

> There's other places that acquire a bunch of locks, and all of them need
> to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
> GIST_MAX_SPLIT_PAGES.
>
>
> regards

Re: Thoughts about NUM_BUFFER_PARTITIONS

From
Tomas Vondra
Date:
Hi,

On 2/20/24 03:16, wenhui qiu wrote:
> Hi Heikki Linnakangas
>    I saw git log found this commit:
> https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892
> ,I don't seem to see an email discussing this commit. As the commit log
> tells us, we don't know exactly how large a value is optimal, and I believe
> it's more flexible to make it as a parameter.Thank you very much
> tomas.vondra for explaining the relationship, i see that MAX_SIMUL_LWLOCKS
> was just doubled in this commit, is there a more appropriate ratio between
> them?
> 

I think the discussion for that commit is in [1] (and especially [2]).

That being said, I don't think MAX_SIMUL_LOCKS and NUM_BUFFER_PARTITIONS
need to be in any particular ratio. The only requirement is that there
needs to be enough slack, and 72 locks seemed to work quite fine until
now - I don't think we need to change that.

What might be necessary is improving held_lwlocks - we treat is as LIFO,
but more as an expectation than a hard rule. I'm not sure how often we
violate that rule (if at all), but if we do then it's going to get more
expensive as we increase the number of locks. But I'm not sure this is
actually a problem in practice, we usually hold very few LWLocks at the
same time.

As for making this a parameter, I'm rather opposed to the idea. If we
don't have a very clear idea how to set this limit, what's the chance
users with little knowledge of the internals will pick a good value?
Adding yet another knob would just mean users start messing with it in
random ways (typically increasing it to very high value, because "more
is better"), causing more harm than good.

Adding it as a GUC would also require making some parts dynamic (instead
of just doing static allocation with compile-time constants). That's not
great, but I'm not sure how significant the penalty might be.


IMHO adding a GUC might be acceptable only if we fail to come up with a
good value (which is going to be a trade off), and if someone
demonstrates a clear benefit of increasing the value (which I don't
think happen in this thread yet).


regards


[1]
https://www.postgresql.org/message-id/flat/CAA4eK1LSTcMwXNO8ovGh7c0UgCHzGbN%3D%2BPjggfzQDukKr3q_DA%40mail.gmail.com

[2]
https://www.postgresql.org/message-id/CA%2BTgmoY58dQi8Z%3DFDAu4ggxHV-HYV03-R9on1LSP9OJU_fy_zA%40mail.gmail.com

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Thoughts about NUM_BUFFER_PARTITIONS

From
wenhui qiu
Date:
Hi Tomas Vondra
    Thanks for the information!  But I found postgres pro enterprise version has been implemented ,However, it defaults to 16 and maxes out at 128, and the maxes are the same as in PostgreSQL.I kindly  hope that if the developers can explain what the purpose of this is.May be 128 partitions is the optimal value,It's a parameter to make it easier to adjust the number of partitions in the future when it's really not enough. and the code comments also said that  hope to implement the parameter in the future


( https://postgrespro.com/docs/enterprise/16/runtime-config-locks )


log2_num_lock_partitions (integer#

This controls how many partitions the shared lock tables are divided into. Number of partitions is calculated by raising 2 to the power of this parameter. The default value is 4, which corresponds to 16 partitions, and the maximum is 8. This parameter can only be set in the postgresql.conf file or on the server command line.


Best wish 


On Tue, 20 Feb 2024 at 21:55, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,

On 2/20/24 03:16, wenhui qiu wrote:
> Hi Heikki Linnakangas
>    I saw git log found this commit:
> https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892
> ,I don't seem to see an email discussing this commit. As the commit log
> tells us, we don't know exactly how large a value is optimal, and I believe
> it's more flexible to make it as a parameter.Thank you very much
> tomas.vondra for explaining the relationship, i see that MAX_SIMUL_LWLOCKS
> was just doubled in this commit, is there a more appropriate ratio between
> them?
>

I think the discussion for that commit is in [1] (and especially [2]).

That being said, I don't think MAX_SIMUL_LOCKS and NUM_BUFFER_PARTITIONS
need to be in any particular ratio. The only requirement is that there
needs to be enough slack, and 72 locks seemed to work quite fine until
now - I don't think we need to change that.

What might be necessary is improving held_lwlocks - we treat is as LIFO,
but more as an expectation than a hard rule. I'm not sure how often we
violate that rule (if at all), but if we do then it's going to get more
expensive as we increase the number of locks. But I'm not sure this is
actually a problem in practice, we usually hold very few LWLocks at the
same time.

As for making this a parameter, I'm rather opposed to the idea. If we
don't have a very clear idea how to set this limit, what's the chance
users with little knowledge of the internals will pick a good value?
Adding yet another knob would just mean users start messing with it in
random ways (typically increasing it to very high value, because "more
is better"), causing more harm than good.

Adding it as a GUC would also require making some parts dynamic (instead
of just doing static allocation with compile-time constants). That's not
great, but I'm not sure how significant the penalty might be.


IMHO adding a GUC might be acceptable only if we fail to come up with a
good value (which is going to be a trade off), and if someone
demonstrates a clear benefit of increasing the value (which I don't
think happen in this thread yet).


regards


[1]
https://www.postgresql.org/message-id/flat/CAA4eK1LSTcMwXNO8ovGh7c0UgCHzGbN%3D%2BPjggfzQDukKr3q_DA%40mail.gmail.com

[2]
https://www.postgresql.org/message-id/CA%2BTgmoY58dQi8Z%3DFDAu4ggxHV-HYV03-R9on1LSP9OJU_fy_zA%40mail.gmail.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Thoughts about NUM_BUFFER_PARTITIONS

From
Tomas Vondra
Date:
On 2/23/24 15:40, wenhui qiu wrote:
> Hi Tomas Vondra
>     Thanks for the information!  But I found postgres pro enterprise
> version has been implemented ,However, it defaults to 16 and maxes out at
> 128, and the maxes are the same as in PostgreSQL.I kindly  hope that if the
> developers can explain what the purpose of this is.May be 128 partitions is
> the optimal value,It's a parameter to make it easier to adjust the number
> of partitions in the future when it's really not enough. and the code
> comments also said that  hope to implement the parameter in the future
> 
> 
> ( https://postgrespro.com/docs/enterprise/16/runtime-config-locks )
> 
> 
> log2_num_lock_partitions (integer) #
> <https://postgrespro.com/docs/enterprise/16/runtime-config-locks#GUC-LOG2-NUM-LOCK-PARTITIONS>
> 
> This controls how many partitions the shared lock tables are divided into.
> Number of partitions is calculated by raising 2 to the power of this
> parameter. The default value is 4, which corresponds to 16 partitions, and
> the maximum is 8. This parameter can only be set in the postgresql.conf file
> or on the server command line.
> 
> Best wish
> 

Hi,

Well, if Postgres Pro implements this, I don't know what their reasoning
was exactly, but I guess they wanted to make it easier to experiment
with different values (without rebuild), or maybe they actually have
systems where they know higher values help ...

Note: I'd point the maximum value 8 translates to 256, so no - it does
not max at the same value as PostgreSQL.

Anyway, this value is inherently a trade off. If it wasn't, we'd set it
to something super high from the start. But having more partitions of
the lock table has a cost too, because some parts need to acquire all
the partition locks (and that's O(N) where N = number of partitions).

Of course, not having enough lock table partitions has a cost too,
because it increases the chance of conflict between backends (who happen
to need to operate on the same partition). This constant is not
constant, it changes over time - with 16 cores the collisions might have
been rare, with 128 not so much. Also, with partitioning we may need
many more locks per query.

This means it's entirely possible it'd be good to have more than 128
partitions of the lock table, but we only change this kind of stuff if
we have 2 things:

1) clear demonstration of the benefits (e.g. a workload showing an
improvement with higher number of partitions)

2) analysis of how this affects other workloads (e.g. cases that may
need to lock all the partitions etc)

Ultimately it's a trade off - we need to judge if the impact in (2) is
worth the improvement in (1).

None of this was done in this thread. There's no demonstration of the
benefits, no analysis of the impact etc.

As for turning the parameter into a GUC, that has a cost too. Either
direct - a compiler can do far more optimizations with compile-time
constants than with values that may change during execution, for
example. Or indirect - if we can't give users any guidance how/when to
tune the GUC, it can easily lead to misconfiguration (I can't even count
how many times I had to deal with systems where the values were "tuned"
following the logic that more is always better).

Which just leads back to (1) and (2) even for this case.


regards


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Thoughts about NUM_BUFFER_PARTITIONS

From
"Andrey M. Borodin"
Date:
One of our customers recently asked me to look into buffer mapping.
Following is my POV on the problem of optimal NUM_BUFFER_PARTITIONS.

I’ve found some dead code: BufMappingPartitionLockByIndex() is unused, and unused for a long time. See patch 1.

> On 23 Feb 2024, at 22:25, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> Well, if Postgres Pro implements this, I don't know what their reasoning
> was exactly, but I guess they wanted to make it easier to experiment
> with different values (without rebuild), or maybe they actually have
> systems where they know higher values help ...
>
> Note: I'd point the maximum value 8 translates to 256, so no - it does
> not max at the same value as PostgreSQL.

I’ve prototyped similar GUC for anyone willing to do such experiments. See patch 2, 4. Probably, I’ll do some
experimentstoo, on customer's clusters and workloads :) 

> Anyway, this value is inherently a trade off. If it wasn't, we'd set it
> to something super high from the start. But having more partitions of
> the lock table has a cost too, because some parts need to acquire all
> the partition locks (and that's O(N) where N = number of partitions).

I’ve found none such cases, actually. Or, perhaps, I was not looking good enough.
pg_buffercache iterates over buffers and releases locks. See patch 3 to fix comments.

> Of course, not having enough lock table partitions has a cost too,
> because it increases the chance of conflict between backends (who happen
> to need to operate on the same partition). This constant is not
> constant, it changes over time - with 16 cores the collisions might have
> been rare, with 128 not so much. Also, with partitioning we may need
> many more locks per query.
>
> This means it's entirely possible it'd be good to have more than 128
> partitions of the lock table, but we only change this kind of stuff if
> we have 2 things:
>
> 1) clear demonstration of the benefits (e.g. a workload showing an
> improvement with higher number of partitions)
>
> 2) analysis of how this affects other workloads (e.g. cases that may
> need to lock all the partitions etc)
>
> Ultimately it's a trade off - we need to judge if the impact in (2) is
> worth the improvement in (1).
>
> None of this was done in this thread. There's no demonstration of the
> benefits, no analysis of the impact etc.
>
> As for turning the parameter into a GUC, that has a cost too. Either
> direct - a compiler can do far more optimizations with compile-time
> constants than with values that may change during execution, for
> example.

I think overhead of finding partition by hash is negligible small.
num_partitions in HTAB controls number of freelists. This might have some effect.

> Or indirect - if we can't give users any guidance how/when to
> tune the GUC, it can easily lead to misconfiguration (I can't even count
> how many times I had to deal with systems where the values were "tuned"
> following the logic that more is always better).

Yes, this argument IMHO is most important. By doing more such knobs we promote superstitious approach to tuning.


Best regards, Andrey Borodin.

Attachment

Re: Thoughts about NUM_BUFFER_PARTITIONS

From
Nathan Bossart
Date:
On Mon, Aug 05, 2024 at 12:32:20AM +0500, Andrey M. Borodin wrote:
> I´ve found some dead code: BufMappingPartitionLockByIndex() is unused,
> and unused for a long time. See patch 1.

I don't see a reason to also get rid of BufTableHashPartition(), but
otherwise this looks reasonable to me.  It would probably be a good idea to
first check whether there are any external callers we can find.

> I´ve prototyped similar GUC for anyone willing to do such experiments.
> See patch 2, 4. Probably, I´ll do some experiments too, on customer's
> clusters and workloads :)

Like Tomas, I'm not too wild about making this a GUC.  And as Heikki
pointed out upthread, a good first step would be to benchmark different
NUM_BUFFER_PARTITIONS settings on modern hardware.  I suspect the current
setting is much lower than optimal (e.g., I doubled it and saw a TPS boost
for read-only pgbench on an i5-13500T), but I don't think we fully
understand the performance characteristics with different settings yet.  If
we find that the ideal setting depends heavily on workload/hardware, then
perhaps we can consider adding a GUC, but I don't think we are there yet.

>> Anyway, this value is inherently a trade off. If it wasn't, we'd set it
>> to something super high from the start. But having more partitions of
>> the lock table has a cost too, because some parts need to acquire all
>> the partition locks (and that's O(N) where N = number of partitions).
> 
> I´ve found none such cases, actually. Or, perhaps, I was not looking good
> enough. pg_buffercache iterates over buffers and releases locks. See
> patch 3 to fix comments.

Yeah, I think 0003 is reasonable, too.  pg_buffercache stopped acquiring
all the partition locks in v10 (see commit 6e65454), which is also the
commit that removed all remaining uses of BufMappingPartitionLockByIndex().
In fact, I think BufMappingPartitionLockByIndex() was introduced just for
this pg_buffercache use-case (see commit ea9df81). 

-- 
nathan