Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock |
Date | |
Msg-id | CA+TgmoaYPeo+NB__teHfAuJ2K4zs8izTggXDATe35yn6fq_LmA@mail.gmail.com Whole thread Raw |
In response to | Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
|
List | pgsql-hackers |
On Wed, Nov 8, 2023 at 6:41 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > Here is the updated version of the patch, here I have taken the > approach suggested by Andrey and I discussed the same with Alvaro > offlist and he also agrees with it. So the idea is that we will keep > the bank size fixed which is 16 buffers per bank and the allowed GUC > value for each slru buffer must be in multiple of the bank size. We > have removed the centralized lock but instead of one lock per bank, we > have kept the maximum limit on the number of bank locks which is 128. > We kept the max limit as 128 because, in one of the operations (i.e. > ActivateCommitTs), we need to acquire all the bank locks (but this is > not a performance path at all) and at a time we can acquire a max of > 200 LWlocks, so we think this limit of 128 is good. So now if the > number of banks is <= 128 then we will be using one lock per bank > otherwise the one lock may protect access of buffer in multiple banks. Just so I understand, I guess this means that an SLRU is limited to 16*128 = 2k buffers = 16MB? When we were talking about this earlier, I suggested fixing the number of banks and allowing the number of buffers per bank to scale depending on the setting. That seemed simpler than allowing both the number of banks and the number of buffers to vary, and it might allow the compiler to optimize some code better, by converting a calculation like page_no%number_of_banks into a masking operation like page_no&0xf or whatever. However, because it allows an individual bank to become arbitrarily large, it more or less requires us to use a buffer mapping table. Some of the performance problems mentioned could be alleviated by omitting the hash table when the number of buffers per bank is small, and we could also create the dynahash with a custom hash function that just does modular arithmetic on the page number rather than a real hashing operation. However, maybe we don't really need to do any of that. I agree that dynahash is clunky on a good day. I hadn't realized the impact would be so noticeable. This proposal takes the opposite approach of fixing the number of buffers per bank, letting the number of banks vary. I think that's probably fine, although it does reduce the effective associativity of the cache. If there are more hot buffers in a bank than the bank size, the bank will be contended, even if other banks are cold. However, given the way SLRUs are accessed, it seems hard to imagine this being a real problem in practice. There aren't likely to be say 20 hot buffers that just so happen to all be separated from one another by a number of pages that is a multiple of the configured number of banks. And in the seemingly very unlikely event that you have a workload that behaves like that, you could always adjust the number of banks up or down by one, and the problem would go away. So this seems OK to me. I also agree with a couple of points that Alvaro made, specifically that (1) this doesn't have to be perfect, just better than now and (2) separate GUCs for each SLRU is fine. On the latter point, it's worth keeping in mind that the cost of a GUC that most people don't need to tune is fairly low. GUCs like work_mem and shared_buffers are "expensive" because everybody more or less needs to understand what they are and how to set them and getting the right value can tricky -- but a GUC like autovacuum_naptime is a lot cheaper, because almost nobody needs to change it. It seems to me that these GUCs will fall into the latter category. Users can hopefully just ignore them except if they see a contention on the SLRU bank locks -- and then they can consider increasing the number of banks for that particular SLRU. That seems simple enough. As with autovacuum_naptime, there is a danger that people will configure a ridiculous value of the parameter for no good reason and get bad results, so it would be nice if someday we had a magical system that just got all of this right without the user needing to configure anything. But in the meantime, it's better to have a somewhat manual system to relieve pressure on these locks than no system at all. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: