Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers
From | Yura Sokolov |
---|---|
Subject | Re: BufferAlloc: don't take two simultaneous locks |
Date | |
Msg-id | 91d6d0f9eae224d690833bc859753f79e36f2f3a.camel@postgrespro.ru Whole thread Raw |
In response to | Re: BufferAlloc: don't take two simultaneous locks (Yura Sokolov <y.sokolov@postgrespro.ru>) |
Responses |
Re: BufferAlloc: don't take two simultaneous locks
|
List | pgsql-hackers |
В Пн, 14/03/2022 в 14:57 +0300, Yura Sokolov пишет: > В Пн, 14/03/2022 в 17:12 +0900, Kyotaro Horiguchi пишет: > > At Mon, 14 Mar 2022 09:15:11 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in > > > В Пн, 14/03/2022 в 14:31 +0900, Kyotaro Horiguchi пишет: > > > > I'd like to ask you to remove nalloced from partitions then add a > > > > global atomic for the same use? > > > > > > I really believe it should be global. I made it per-partition to > > > not overcomplicate first versions. Glad you tell it. > > > > > > I thought to protect it with freeList[0].mutex, but probably atomic > > > is better idea here. But which atomic to chose: uint64 or uint32? > > > Based on sizeof(long)? > > > Ok, I'll do in next version. > > > > Current nentries is a long (= int64 on CentOS). And uint32 can support > > roughly 2^32 * 8192 = 32TB shared buffers, which doesn't seem safe > > enough. So it would be uint64. > > > > > Whole get_hash_entry look strange. > > > Doesn't it better to cycle through partitions and only then go to > > > get_hash_entry? > > > May be there should be bitmap for non-empty free lists? 32bit for > > > 32 partitions. But wouldn't bitmap became contention point itself? > > > > The code puts significance on avoiding contention caused by visiting > > freelists of other partitions. And perhaps thinks that freelist > > shortage rarely happen. > > > > I tried pgbench runs with scale 100 (with 10 threads, 10 clients) on > > 128kB shared buffers and I saw that get_hash_entry never takes the > > !element_alloc() path and always allocate a fresh entry, then > > saturates at 30 new elements allocated at the medium of a 100 seconds > > run. > > > > Then, I tried the same with the patch, and I am surprized to see that > > the rise of the number of newly allocated elements didn't stop and > > went up to 511 elements after the 100 seconds run. So I found that my > > concern was valid. The change in dynahash actually > > continuously/repeatedly causes lack of free list entries. I'm not > > sure how much the impact given on performance if we change > > get_hash_entry to prefer other freelists, though. > > Well, it is quite strange SharedBufHash is not allocated as > HASH_FIXED_SIZE. Could you check what happens with this flag set? > I'll try as well. > > Other way to reduce observed case is to remember freelist_idx for > reused entry. I didn't believe it matters much since entries migrated > netherless, but probably due to some hot buffers there are tention to > crowd particular freelist. Well, I did both. Everything looks ok. > > By the way, there's the following comment in StrategyInitalize. > > > > > * Initialize the shared buffer lookup hashtable. > > > * > > > * Since we can't tolerate running out of lookup table entries, we must be > > > * sure to specify an adequate table size here. The maximum steady-state > > > * usage is of course NBuffers entries, but BufferAlloc() tries to insert > > > * a new entry before deleting the old. In principle this could be > > > * happening in each partition concurrently, so we could need as many as > > > * NBuffers + NUM_BUFFER_PARTITIONS entries. > > > */ > > > InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); > > > > "but BufferAlloc() tries to insert a new entry before deleting the > > old." gets false by this patch but still need that additional room for > > stashed entries. It seems like needing a fix. Removed whole paragraph because fixed table without extra entries works just fine. I lost access to Xeon 8354H, so returned to old Xeon X5675. 128MB and 1GB shared buffers pgbench with scale 100 select_only benchmark, unix sockets. Notebook i7-1165G7: conns | master | v8 | master 1G | v8 1G --------+------------+------------+------------+------------ 1 | 29614 | 29285 | 32413 | 32784 2 | 58541 | 60052 | 65851 | 65938 3 | 91126 | 90185 | 101404 | 101956 5 | 135809 | 133670 | 143783 | 143471 7 | 155547 | 153568 | 162566 | 162361 17 | 221794 | 218143 | 250562 | 250136 27 | 213742 | 211226 | 241806 | 242594 53 | 216067 | 214792 | 245868 | 246269 83 | 216610 | 218261 | 246798 | 250515 107 | 216169 | 216656 | 248424 | 250105 139 | 208892 | 215054 | 244630 | 246439 163 | 206988 | 212751 | 244061 | 248051 191 | 203842 | 214764 | 241793 | 245081 211 | 201304 | 213997 | 240863 | 246076 239 | 199313 | 211713 | 239639 | 243586 271 | 196712 | 211849 | 236231 | 243831 307 | 194879 | 209813 | 233811 | 241303 353 | 191279 | 210145 | 230896 | 241039 397 | 188509 | 207480 | 227812 | 240637 X5675 1 socket: conns | master | v8 | master 1G | v8 1G --------+------------+------------+------------+------------ 1 | 18590 | 18473 | 19652 | 19051 2 | 34899 | 34799 | 37242 | 37432 3 | 51484 | 51393 | 54750 | 54398 5 | 71037 | 70564 | 76482 | 75985 7 | 87391 | 86937 | 96185 | 95433 17 | 122609 | 123087 | 140578 | 140325 27 | 120051 | 120508 | 136318 | 136343 53 | 116851 | 117601 | 133338 | 133265 83 | 113682 | 116755 | 131841 | 132736 107 | 111925 | 116003 | 130661 | 132386 139 | 109338 | 115011 | 128319 | 131453 163 | 107661 | 114398 | 126684 | 130677 191 | 105000 | 113745 | 124850 | 129909 211 | 103607 | 113347 | 123469 | 129302 239 | 101820 | 112428 | 121752 | 128621 271 | 100060 | 111863 | 119743 | 127624 307 | 98554 | 111270 | 117650 | 126877 353 | 97530 | 110231 | 115904 | 125351 397 | 96122 | 109471 | 113609 | 124150 X5675 2 socket: conns | master | v8 | master 1G | v8 1G --------+------------+------------+------------+------------ 1 | 17815 | 17577 | 19321 | 19187 2 | 34312 | 35655 | 37121 | 36479 3 | 51868 | 52165 | 56048 | 54984 5 | 81704 | 82477 | 90945 | 90109 7 | 107937 | 105411 | 116015 | 115810 17 | 191339 | 190813 | 216899 | 215775 27 | 236541 | 238078 | 278507 | 278073 53 | 230323 | 231709 | 267226 | 267449 83 | 225560 | 227455 | 261996 | 262344 107 | 221317 | 224030 | 259694 | 259553 139 | 206945 | 219005 | 254817 | 256736 163 | 197723 | 220353 | 251631 | 257305 191 | 193243 | 219149 | 246960 | 256528 211 | 189603 | 218545 | 245362 | 255785 239 | 186382 | 217229 | 240006 | 255024 271 | 183141 | 216359 | 236927 | 253069 307 | 179275 | 215218 | 232571 | 252375 353 | 175559 | 213298 | 227244 | 250534 397 | 172916 | 211627 | 223513 | 248919 Strange thing: both master and patched version has higher peak tps at X5676 at medium connections (17 or 27 clients) than in first october version [1]. But lower tps at higher connections number (>= 191 clients). I'll try to bisect on master this unfortunate change. October master was 2d44dee0281a1abf and today's is 7e12256b478b895 (There is small possibility that I tested with TCP sockets in october and with UNIX sockets today and that gave difference.) [1] https://postgr.esq/m/1edbb61981fe1d99c3f20e3d56d6c88999f4227c.camel%40postgrespro.ru ------- regards Yura Sokolov Postgres Professional y.sokolov@postgrespro.ru
Attachment
pgsql-hackers by date: