Thread: Thoughts about NUM_BUFFER_PARTITIONS
HI hackers
When I read this text in this document there is a paragraph in it(https://www.interdb.jp/pg/pgsql08/03.html)
/*
The BufMappingLock is split into partitions to reduce contention in the buffer table (the default is 128 partitions). Each BufMappingLock partition guards a portion of the corresponding hash bucket slots.
*/,
Physical servers with terabytes of RAM are now commonplace,I'm looking at the comments inside the source code.I'm looking at the comments inside the source code to see if they still match the current hardware capability? The comment says that in the future there may be a parameter,Iam a dba now and I try to think of implementing this parameter, but I'm not a professional kernel developer, I still want the community senior developer to implement this parameter
/*
* It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
* here, but we need them to figure out offsets within MainLWLockArray, and
* having this file include lock.h or bufmgr.h would be backwards.
*/
/* Number of partitions of the shared buffer mapping hashtable */
#define NUM_BUFFER_PARTITIONS 128
/*
* The number of partitions for locking purposes. This is set to match
* NUM_BUFFER_PARTITIONS for now, on the basis that whatever's good enough for
* the buffer pool must be good enough for any other purpose. This could
* become a runtime parameter in future.
*/
#define DSHASH_NUM_PARTITIONS_LOG2 7
#define DSHASH_NUM_PARTITIONS (1 << DSHASH_NUM_PARTITIONS_LOG2)
When I read this text in this document there is a paragraph in it(https://www.interdb.jp/pg/pgsql08/03.html)
/*
The BufMappingLock is split into partitions to reduce contention in the buffer table (the default is 128 partitions). Each BufMappingLock partition guards a portion of the corresponding hash bucket slots.
*/,
Physical servers with terabytes of RAM are now commonplace,I'm looking at the comments inside the source code.I'm looking at the comments inside the source code to see if they still match the current hardware capability? The comment says that in the future there may be a parameter,Iam a dba now and I try to think of implementing this parameter, but I'm not a professional kernel developer, I still want the community senior developer to implement this parameter
/*
* It's a bit odd to declare NUM_BUFFER_PARTITIONS and NUM_LOCK_PARTITIONS
* here, but we need them to figure out offsets within MainLWLockArray, and
* having this file include lock.h or bufmgr.h would be backwards.
*/
/* Number of partitions of the shared buffer mapping hashtable */
#define NUM_BUFFER_PARTITIONS 128
/*
* The number of partitions for locking purposes. This is set to match
* NUM_BUFFER_PARTITIONS for now, on the basis that whatever's good enough for
* the buffer pool must be good enough for any other purpose. This could
* become a runtime parameter in future.
*/
#define DSHASH_NUM_PARTITIONS_LOG2 7
#define DSHASH_NUM_PARTITIONS (1 << DSHASH_NUM_PARTITIONS_LOG2)
On 08/02/2024 12:17, wenhui qiu wrote: > HI hackers > When I read this text in this document there is a paragraph in > it(https://www.interdb.jp/pg/pgsql08/03.html > <https://www.interdb.jp/pg/pgsql08/03.html>) > /* > The BufMappingLock is split into partitions to reduce contention in the > buffer table (the default is 128 partitions). Each BufMappingLock > partition guards a portion of the corresponding hash bucket slots. > */, > > Physical servers with terabytes of RAM are now commonplace,I'm looking > at the comments inside the source code.I'm looking at the comments > inside the source code to see if they still match the current hardware > capability? The optimal number of partitions has more to do with the number of concurrent processes using the buffer cache, rather than the size of the cache. The number of CPUs in servers has increased too, but not as much as RAM. But yeah, it's a valid question if the settings still make sense with modern hardware. > The comment says that in the future there may be a > parameter,Iam a dba now and I try to think of implementing this > parameter, but I'm not a professional kernel developer, I still want the > community senior developer to implement this parameter The first thing to do would be to benchmark with different NUM_BUFFER_PARTITIONS settings, and see if there's benefit in having more partitions. -- Heikki Linnakangas Neon (https://neon.tech)
Hi Heikki Linnakangas
I think the larger shared buffer higher the probability of multiple backend processes accessing the same bucket slot BufMappingLock simultaneously, ( InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I have free time, I want to do this test. I have seen some tests, but the result report is in Chinese
Best wishes
I think the larger shared buffer higher the probability of multiple backend processes accessing the same bucket slot BufMappingLock simultaneously, ( InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I have free time, I want to do this test. I have seen some tests, but the result report is in Chinese
Best wishes
Heikki Linnakangas <hlinnaka@iki.fi> 于2024年2月8日周四 19:26写道:
On 08/02/2024 12:17, wenhui qiu wrote:
> HI hackers
> When I read this text in this document there is a paragraph in
> it(https://www.interdb.jp/pg/pgsql08/03.html
> <https://www.interdb.jp/pg/pgsql08/03.html>)
> /*
> The BufMappingLock is split into partitions to reduce contention in the
> buffer table (the default is 128 partitions). Each BufMappingLock
> partition guards a portion of the corresponding hash bucket slots.
> */,
>
> Physical servers with terabytes of RAM are now commonplace,I'm looking
> at the comments inside the source code.I'm looking at the comments
> inside the source code to see if they still match the current hardware
> capability?
The optimal number of partitions has more to do with the number of
concurrent processes using the buffer cache, rather than the size of the
cache. The number of CPUs in servers has increased too, but not as much
as RAM.
But yeah, it's a valid question if the settings still make sense with
modern hardware.
> The comment says that in the future there may be a
> parameter,Iam a dba now and I try to think of implementing this
> parameter, but I'm not a professional kernel developer, I still want the
> community senior developer to implement this parameter
The first thing to do would be to benchmark with different
NUM_BUFFER_PARTITIONS settings, and see if there's benefit in having
more partitions.
--
Heikki Linnakangas
Neon (https://neon.tech)
On 2/8/24 14:27, wenhui qiu wrote: > Hi Heikki Linnakangas > I think the larger shared buffer higher the probability of multiple > backend processes accessing the same bucket slot BufMappingLock > simultaneously, ( InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I > have free time, I want to do this test. I have seen some tests, but the > result report is in Chinese > I think Heikki is right this is unrelated to the amount of RAM. The partitions are meant to reduce the number of lock collisions when multiple processes try to map a buffer concurrently. But the machines got much larger in this regard too - in 2006 the common CPUs had maybe 2-4 cores, now it's common to have CPUs with ~100 cores, and systems with multiple of them. OTOH the time spent holing the partition lock should be pretty low, IIRC we pretty much just pin the buffer before releasing it, and the backend should do plenty other expensive stuff. So who knows how many backends end up doing the locking at the same time. OTOH, with 128 partitions it takes just 14 backends to have 50% chance of a conflict, so with enough cores ... But how many partitions would be enough? With 1024 partitions it still takes only 38 backends to get 50% chance of a collision. Better, but considering we now have hundreds of cores, not sure if sufficient. (Obviously, we probably want much lower probability of a collision, I only used 50% to illustrate the changes). regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On Feb 10, 2024, at 20:15, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 2/8/24 14:27, wenhui qiu wrote: >> Hi Heikki Linnakangas >> I think the larger shared buffer higher the probability of multiple >> backend processes accessing the same bucket slot BufMappingLock >> simultaneously, ( InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I >> have free time, I want to do this test. I have seen some tests, but the >> result report is in Chinese >> > > I think Heikki is right this is unrelated to the amount of RAM. The > partitions are meant to reduce the number of lock collisions when > multiple processes try to map a buffer concurrently. But the machines > got much larger in this regard too - in 2006 the common CPUs had maybe > 2-4 cores, now it's common to have CPUs with ~100 cores, and systems > with multiple of them. OTOH the time spent holing the partition lock > should be pretty low, IIRC we pretty much just pin the buffer before > releasing it, and the backend should do plenty other expensive stuff. So > who knows how many backends end up doing the locking at the same time. > > OTOH, with 128 partitions it takes just 14 backends to have 50% chance > of a conflict, so with enough cores ... But how many partitions would be > enough? With 1024 partitions it still takes only 38 backends to get 50% > chance of a collision. Better, but considering we now have hundreds of > cores, not sure if sufficient. > > (Obviously, we probably want much lower probability of a collision, I > only used 50% to illustrate the changes). > I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS, I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and NUM_BUFFER_PARTITIONS, am I missing someghing?
On 2/18/24 03:30, Li Japin wrote: > > >> On Feb 10, 2024, at 20:15, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> >> On 2/8/24 14:27, wenhui qiu wrote: >>> Hi Heikki Linnakangas >>> I think the larger shared buffer higher the probability of multiple >>> backend processes accessing the same bucket slot BufMappingLock >>> simultaneously, ( InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); When I >>> have free time, I want to do this test. I have seen some tests, but the >>> result report is in Chinese >>> >> >> I think Heikki is right this is unrelated to the amount of RAM. The >> partitions are meant to reduce the number of lock collisions when >> multiple processes try to map a buffer concurrently. But the machines >> got much larger in this regard too - in 2006 the common CPUs had maybe >> 2-4 cores, now it's common to have CPUs with ~100 cores, and systems >> with multiple of them. OTOH the time spent holing the partition lock >> should be pretty low, IIRC we pretty much just pin the buffer before >> releasing it, and the backend should do plenty other expensive stuff. So >> who knows how many backends end up doing the locking at the same time. >> >> OTOH, with 128 partitions it takes just 14 backends to have 50% chance >> of a conflict, so with enough cores ... But how many partitions would be >> enough? With 1024 partitions it still takes only 38 backends to get 50% >> chance of a collision. Better, but considering we now have hundreds of >> cores, not sure if sufficient. >> >> (Obviously, we probably want much lower probability of a collision, I >> only used 50% to illustrate the changes). >> > > I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS, > I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and > NUM_BUFFER_PARTITIONS, am I missing someghing? IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all the partition locks if needed. There's other places that acquire a bunch of locks, and all of them need to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has GIST_MAX_SPLIT_PAGES. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 19 Feb 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > On 2/18/24 03:30, Li Japin wrote: >> >> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS, >> I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and >> NUM_BUFFER_PARTITIONS, am I missing someghing? > > IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be > higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all > the partition locks if needed. > Thanks for the explanation! Got it. > There's other places that acquire a bunch of locks, and all of them need > to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has > GIST_MAX_SPLIT_PAGES. > > > regards
Hi Japlin Li
Thank you for such important information ! Got it
Japin Li <japinli@hotmail.com> 于2024年2月19日周一 10:26写道:
On Mon, 19 Feb 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> On 2/18/24 03:30, Li Japin wrote:
>>
>> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
>> I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
>> NUM_BUFFER_PARTITIONS, am I missing someghing?
>
> IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
> higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
> the partition locks if needed.
>
Thanks for the explanation! Got it.
> There's other places that acquire a bunch of locks, and all of them need
> to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
> GIST_MAX_SPLIT_PAGES.
>
>
> regards
Hi Heikki Linnakangas
I saw git log found this commit:https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892 ,I don't seem to see an email discussing this commit. As the commit log tells us, we don't know exactly how large a value is optimal, and I believe it's more flexible to make it as a parameter.Thank you very much tomas.vondra for explaining the relationship, i see that MAX_SIMUL_LWLOCKS was just doubled in this commit, is there a more appropriate ratio between them?
```````````````````````````````````````````````````````````````````````````
commit 3acc10c997f916f6a741d0b4876126b7b08e3892
Author: Robert Haas <rhaas@postgresql.org>
Date: Thu Oct 2 13:58:50 2014 -0400
Increase the number of buffer mapping partitions to 128.
Testing by Amit Kapila, Andres Freund, and myself, with and without
other patches that also aim to improve scalability, seems to indicate
that this change is a significant win over the current value and over
smaller values such as 64. It's not clear how high we can push this
value before it starts to have negative side-effects elsewhere, but
going this far looks OK.
`````````````````````````````````````````````````````````
I saw git log found this commit:https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892 ,I don't seem to see an email discussing this commit. As the commit log tells us, we don't know exactly how large a value is optimal, and I believe it's more flexible to make it as a parameter.Thank you very much tomas.vondra for explaining the relationship, i see that MAX_SIMUL_LWLOCKS was just doubled in this commit, is there a more appropriate ratio between them?
```````````````````````````````````````````````````````````````````````````
commit 3acc10c997f916f6a741d0b4876126b7b08e3892
Author: Robert Haas <rhaas@postgresql.org>
Date: Thu Oct 2 13:58:50 2014 -0400
Increase the number of buffer mapping partitions to 128.
Testing by Amit Kapila, Andres Freund, and myself, with and without
other patches that also aim to improve scalability, seems to indicate
that this change is a significant win over the current value and over
smaller values such as 64. It's not clear how high we can push this
value before it starts to have negative side-effects elsewhere, but
going this far looks OK.
`````````````````````````````````````````````````````````
wenhui qiu <qiuwenhuifx@gmail.com> 于2024年2月20日周二 09:36写道:
Hi Japlin LiThank you for such important information ! Got itJapin Li <japinli@hotmail.com> 于2024年2月19日周一 10:26写道:
On Mon, 19 Feb 2024 at 00:56, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
> On 2/18/24 03:30, Li Japin wrote:
>>
>> I find it seems need to change MAX_SIMUL_LWLOCKS if we enlarge the NUM_BUFFER_PARTITIONS,
>> I didn’t find any comments to describe the relation between MAX_SIMUL_LWLOCKS and
>> NUM_BUFFER_PARTITIONS, am I missing someghing?
>
> IMHO the relationship is pretty simple - MAX_SIMUL_LWLOCKS needs to be
> higher than NUM_BUFFER_PARTITIONS, so that the backend can acquire all
> the partition locks if needed.
>
Thanks for the explanation! Got it.
> There's other places that acquire a bunch of locks, and all of them need
> to be careful not to exceed MAX_SIMUL_LWLOCKS. For example gist has
> GIST_MAX_SPLIT_PAGES.
>
>
> regards
Hi, On 2/20/24 03:16, wenhui qiu wrote: > Hi Heikki Linnakangas > I saw git log found this commit: > https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892 > ,I don't seem to see an email discussing this commit. As the commit log > tells us, we don't know exactly how large a value is optimal, and I believe > it's more flexible to make it as a parameter.Thank you very much > tomas.vondra for explaining the relationship, i see that MAX_SIMUL_LWLOCKS > was just doubled in this commit, is there a more appropriate ratio between > them? > I think the discussion for that commit is in [1] (and especially [2]). That being said, I don't think MAX_SIMUL_LOCKS and NUM_BUFFER_PARTITIONS need to be in any particular ratio. The only requirement is that there needs to be enough slack, and 72 locks seemed to work quite fine until now - I don't think we need to change that. What might be necessary is improving held_lwlocks - we treat is as LIFO, but more as an expectation than a hard rule. I'm not sure how often we violate that rule (if at all), but if we do then it's going to get more expensive as we increase the number of locks. But I'm not sure this is actually a problem in practice, we usually hold very few LWLocks at the same time. As for making this a parameter, I'm rather opposed to the idea. If we don't have a very clear idea how to set this limit, what's the chance users with little knowledge of the internals will pick a good value? Adding yet another knob would just mean users start messing with it in random ways (typically increasing it to very high value, because "more is better"), causing more harm than good. Adding it as a GUC would also require making some parts dynamic (instead of just doing static allocation with compile-time constants). That's not great, but I'm not sure how significant the penalty might be. IMHO adding a GUC might be acceptable only if we fail to come up with a good value (which is going to be a trade off), and if someone demonstrates a clear benefit of increasing the value (which I don't think happen in this thread yet). regards [1] https://www.postgresql.org/message-id/flat/CAA4eK1LSTcMwXNO8ovGh7c0UgCHzGbN%3D%2BPjggfzQDukKr3q_DA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CA%2BTgmoY58dQi8Z%3DFDAu4ggxHV-HYV03-R9on1LSP9OJU_fy_zA%40mail.gmail.com -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Tomas Vondra
Thanks for the information! But I found postgres pro enterprise version has been implemented ,However, it defaults to 16 and maxes out at 128, and the maxes are the same as in PostgreSQL.I kindly hope that if the developers can explain what the purpose of this is.May be 128 partitions is the optimal value,It's a parameter to make it easier to adjust the number of partitions in the future when it's really not enough. and the code comments also said that hope to implement the parameter in the future
( https://postgrespro.com/docs/enterprise/16/runtime-config-locks )
Thanks for the information! But I found postgres pro enterprise version has been implemented ,However, it defaults to 16 and maxes out at 128, and the maxes are the same as in PostgreSQL.I kindly hope that if the developers can explain what the purpose of this is.May be 128 partitions is the optimal value,It's a parameter to make it easier to adjust the number of partitions in the future when it's really not enough. and the code comments also said that hope to implement the parameter in the future
( https://postgrespro.com/docs/enterprise/16/runtime-config-locks )
log2_num_lock_partitions
(integer
) #This controls how many partitions the shared lock tables are divided into. Number of partitions is calculated by raising 2 to the power of this parameter. The default value is 4, which corresponds to 16 partitions, and the maximum is 8. This parameter can only be set in the
postgresql.conf
file or on the server command line.
Best wish
On Tue, 20 Feb 2024 at 21:55, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,
On 2/20/24 03:16, wenhui qiu wrote:
> Hi Heikki Linnakangas
> I saw git log found this commit:
> https://github.com/postgres/postgres/commit/3acc10c997f916f6a741d0b4876126b7b08e3892
> ,I don't seem to see an email discussing this commit. As the commit log
> tells us, we don't know exactly how large a value is optimal, and I believe
> it's more flexible to make it as a parameter.Thank you very much
> tomas.vondra for explaining the relationship, i see that MAX_SIMUL_LWLOCKS
> was just doubled in this commit, is there a more appropriate ratio between
> them?
>
I think the discussion for that commit is in [1] (and especially [2]).
That being said, I don't think MAX_SIMUL_LOCKS and NUM_BUFFER_PARTITIONS
need to be in any particular ratio. The only requirement is that there
needs to be enough slack, and 72 locks seemed to work quite fine until
now - I don't think we need to change that.
What might be necessary is improving held_lwlocks - we treat is as LIFO,
but more as an expectation than a hard rule. I'm not sure how often we
violate that rule (if at all), but if we do then it's going to get more
expensive as we increase the number of locks. But I'm not sure this is
actually a problem in practice, we usually hold very few LWLocks at the
same time.
As for making this a parameter, I'm rather opposed to the idea. If we
don't have a very clear idea how to set this limit, what's the chance
users with little knowledge of the internals will pick a good value?
Adding yet another knob would just mean users start messing with it in
random ways (typically increasing it to very high value, because "more
is better"), causing more harm than good.
Adding it as a GUC would also require making some parts dynamic (instead
of just doing static allocation with compile-time constants). That's not
great, but I'm not sure how significant the penalty might be.
IMHO adding a GUC might be acceptable only if we fail to come up with a
good value (which is going to be a trade off), and if someone
demonstrates a clear benefit of increasing the value (which I don't
think happen in this thread yet).
regards
[1]
https://www.postgresql.org/message-id/flat/CAA4eK1LSTcMwXNO8ovGh7c0UgCHzGbN%3D%2BPjggfzQDukKr3q_DA%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/CA%2BTgmoY58dQi8Z%3DFDAu4ggxHV-HYV03-R9on1LSP9OJU_fy_zA%40mail.gmail.com
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On 2/23/24 15:40, wenhui qiu wrote: > Hi Tomas Vondra > Thanks for the information! But I found postgres pro enterprise > version has been implemented ,However, it defaults to 16 and maxes out at > 128, and the maxes are the same as in PostgreSQL.I kindly hope that if the > developers can explain what the purpose of this is.May be 128 partitions is > the optimal value,It's a parameter to make it easier to adjust the number > of partitions in the future when it's really not enough. and the code > comments also said that hope to implement the parameter in the future > > > ( https://postgrespro.com/docs/enterprise/16/runtime-config-locks ) > > > log2_num_lock_partitions (integer) # > <https://postgrespro.com/docs/enterprise/16/runtime-config-locks#GUC-LOG2-NUM-LOCK-PARTITIONS> > > This controls how many partitions the shared lock tables are divided into. > Number of partitions is calculated by raising 2 to the power of this > parameter. The default value is 4, which corresponds to 16 partitions, and > the maximum is 8. This parameter can only be set in the postgresql.conf file > or on the server command line. > > Best wish > Hi, Well, if Postgres Pro implements this, I don't know what their reasoning was exactly, but I guess they wanted to make it easier to experiment with different values (without rebuild), or maybe they actually have systems where they know higher values help ... Note: I'd point the maximum value 8 translates to 256, so no - it does not max at the same value as PostgreSQL. Anyway, this value is inherently a trade off. If it wasn't, we'd set it to something super high from the start. But having more partitions of the lock table has a cost too, because some parts need to acquire all the partition locks (and that's O(N) where N = number of partitions). Of course, not having enough lock table partitions has a cost too, because it increases the chance of conflict between backends (who happen to need to operate on the same partition). This constant is not constant, it changes over time - with 16 cores the collisions might have been rare, with 128 not so much. Also, with partitioning we may need many more locks per query. This means it's entirely possible it'd be good to have more than 128 partitions of the lock table, but we only change this kind of stuff if we have 2 things: 1) clear demonstration of the benefits (e.g. a workload showing an improvement with higher number of partitions) 2) analysis of how this affects other workloads (e.g. cases that may need to lock all the partitions etc) Ultimately it's a trade off - we need to judge if the impact in (2) is worth the improvement in (1). None of this was done in this thread. There's no demonstration of the benefits, no analysis of the impact etc. As for turning the parameter into a GUC, that has a cost too. Either direct - a compiler can do far more optimizations with compile-time constants than with values that may change during execution, for example. Or indirect - if we can't give users any guidance how/when to tune the GUC, it can easily lead to misconfiguration (I can't even count how many times I had to deal with systems where the values were "tuned" following the logic that more is always better). Which just leads back to (1) and (2) even for this case. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
One of our customers recently asked me to look into buffer mapping. Following is my POV on the problem of optimal NUM_BUFFER_PARTITIONS. I’ve found some dead code: BufMappingPartitionLockByIndex() is unused, and unused for a long time. See patch 1. > On 23 Feb 2024, at 22:25, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > Well, if Postgres Pro implements this, I don't know what their reasoning > was exactly, but I guess they wanted to make it easier to experiment > with different values (without rebuild), or maybe they actually have > systems where they know higher values help ... > > Note: I'd point the maximum value 8 translates to 256, so no - it does > not max at the same value as PostgreSQL. I’ve prototyped similar GUC for anyone willing to do such experiments. See patch 2, 4. Probably, I’ll do some experimentstoo, on customer's clusters and workloads :) > Anyway, this value is inherently a trade off. If it wasn't, we'd set it > to something super high from the start. But having more partitions of > the lock table has a cost too, because some parts need to acquire all > the partition locks (and that's O(N) where N = number of partitions). I’ve found none such cases, actually. Or, perhaps, I was not looking good enough. pg_buffercache iterates over buffers and releases locks. See patch 3 to fix comments. > Of course, not having enough lock table partitions has a cost too, > because it increases the chance of conflict between backends (who happen > to need to operate on the same partition). This constant is not > constant, it changes over time - with 16 cores the collisions might have > been rare, with 128 not so much. Also, with partitioning we may need > many more locks per query. > > This means it's entirely possible it'd be good to have more than 128 > partitions of the lock table, but we only change this kind of stuff if > we have 2 things: > > 1) clear demonstration of the benefits (e.g. a workload showing an > improvement with higher number of partitions) > > 2) analysis of how this affects other workloads (e.g. cases that may > need to lock all the partitions etc) > > Ultimately it's a trade off - we need to judge if the impact in (2) is > worth the improvement in (1). > > None of this was done in this thread. There's no demonstration of the > benefits, no analysis of the impact etc. > > As for turning the parameter into a GUC, that has a cost too. Either > direct - a compiler can do far more optimizations with compile-time > constants than with values that may change during execution, for > example. I think overhead of finding partition by hash is negligible small. num_partitions in HTAB controls number of freelists. This might have some effect. > Or indirect - if we can't give users any guidance how/when to > tune the GUC, it can easily lead to misconfiguration (I can't even count > how many times I had to deal with systems where the values were "tuned" > following the logic that more is always better). Yes, this argument IMHO is most important. By doing more such knobs we promote superstitious approach to tuning. Best regards, Andrey Borodin.
Attachment
On Mon, Aug 05, 2024 at 12:32:20AM +0500, Andrey M. Borodin wrote: > I´ve found some dead code: BufMappingPartitionLockByIndex() is unused, > and unused for a long time. See patch 1. I don't see a reason to also get rid of BufTableHashPartition(), but otherwise this looks reasonable to me. It would probably be a good idea to first check whether there are any external callers we can find. > I´ve prototyped similar GUC for anyone willing to do such experiments. > See patch 2, 4. Probably, I´ll do some experiments too, on customer's > clusters and workloads :) Like Tomas, I'm not too wild about making this a GUC. And as Heikki pointed out upthread, a good first step would be to benchmark different NUM_BUFFER_PARTITIONS settings on modern hardware. I suspect the current setting is much lower than optimal (e.g., I doubled it and saw a TPS boost for read-only pgbench on an i5-13500T), but I don't think we fully understand the performance characteristics with different settings yet. If we find that the ideal setting depends heavily on workload/hardware, then perhaps we can consider adding a GUC, but I don't think we are there yet. >> Anyway, this value is inherently a trade off. If it wasn't, we'd set it >> to something super high from the start. But having more partitions of >> the lock table has a cost too, because some parts need to acquire all >> the partition locks (and that's O(N) where N = number of partitions). > > I´ve found none such cases, actually. Or, perhaps, I was not looking good > enough. pg_buffercache iterates over buffers and releases locks. See > patch 3 to fix comments. Yeah, I think 0003 is reasonable, too. pg_buffercache stopped acquiring all the partition locks in v10 (see commit 6e65454), which is also the commit that removed all remaining uses of BufMappingPartitionLockByIndex(). In fact, I think BufMappingPartitionLockByIndex() was introduced just for this pg_buffercache use-case (see commit ea9df81). -- nathan