Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: BufferAlloc: don't take two simultaneous locks |
Date | |
Msg-id | 20220314.171248.32647856395543307.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: BufferAlloc: don't take two simultaneous locks (Yura Sokolov <y.sokolov@postgrespro.ru>) |
Responses |
Re: BufferAlloc: don't take two simultaneous locks
Re: BufferAlloc: don't take two simultaneous locks |
List | pgsql-hackers |
At Mon, 14 Mar 2022 09:15:11 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in > В Пн, 14/03/2022 в 14:31 +0900, Kyotaro Horiguchi пишет: > > I'd like to ask you to remove nalloced from partitions then add a > > global atomic for the same use? > > I really believe it should be global. I made it per-partition to > not overcomplicate first versions. Glad you tell it. > > I thought to protect it with freeList[0].mutex, but probably atomic > is better idea here. But which atomic to chose: uint64 or uint32? > Based on sizeof(long)? > Ok, I'll do in next version. Current nentries is a long (= int64 on CentOS). And uint32 can support roughly 2^32 * 8192 = 32TB shared buffers, which doesn't seem safe enough. So it would be uint64. > Whole get_hash_entry look strange. > Doesn't it better to cycle through partitions and only then go to > get_hash_entry? > May be there should be bitmap for non-empty free lists? 32bit for > 32 partitions. But wouldn't bitmap became contention point itself? The code puts significance on avoiding contention caused by visiting freelists of other partitions. And perhaps thinks that freelist shortage rarely happen. I tried pgbench runs with scale 100 (with 10 threads, 10 clients) on 128kB shared buffers and I saw that get_hash_entry never takes the !element_alloc() path and always allocate a fresh entry, then saturates at 30 new elements allocated at the medium of a 100 seconds run. Then, I tried the same with the patch, and I am surprized to see that the rise of the number of newly allocated elements didn't stop and went up to 511 elements after the 100 seconds run. So I found that my concern was valid. The change in dynahash actually continuously/repeatedly causes lack of free list entries. I'm not sure how much the impact given on performance if we change get_hash_entry to prefer other freelists, though. By the way, there's the following comment in StrategyInitalize. > * Initialize the shared buffer lookup hashtable. > * > * Since we can't tolerate running out of lookup table entries, we must be > * sure to specify an adequate table size here. The maximum steady-state > * usage is of course NBuffers entries, but BufferAlloc() tries to insert > * a new entry before deleting the old. In principle this could be > * happening in each partition concurrently, so we could need as many as > * NBuffers + NUM_BUFFER_PARTITIONS entries. > */ > InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); "but BufferAlloc() tries to insert a new entry before deleting the old." gets false by this patch but still need that additional room for stashed entries. It seems like needing a fix. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: