Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers
From | Yura Sokolov |
---|---|
Subject | Re: BufferAlloc: don't take two simultaneous locks |
Date | |
Msg-id | 3222ea2f83d59fdb1c765cc8fc22271163e0a3dc.camel@postgrespro.ru Whole thread Raw |
In response to | Re: BufferAlloc: don't take two simultaneous locks (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: BufferAlloc: don't take two simultaneous locks
Re: BufferAlloc: don't take two simultaneous locks |
List | pgsql-hackers |
В Чт, 21/04/2022 в 16:24 -0400, Robert Haas пишет: > On Thu, Apr 21, 2022 at 5:04 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > > $ pid=`ps x | awk '/checkpointer/ && !/awk/ { print $1 }'` > > $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value' > > > > $1 = 16512 > > > > $ install/bin/pgbench -c 600 -j 800 -T 10 -P 1 -S -M prepared postgres > > ... > > $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value' > > > > $1 = 20439 > > > > $ install/bin/pgbench -c 600 -j 800 -T 10 -P 1 -S -M prepared postgres > > ... > > $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value' > > > > $1 = 20541 > > > > It stabilizes at 20541 > > Hmm. So is the existing comment incorrect? It is correct and incorrect at the same time. Logically it is correct. And it is correct in practice if HASH_FIXED_SIZE is set for SharedBufHash (which is not currently). But setting HASH_FIXED_SIZE hurts performance with low number of spare items. > Remember, I was complaining > about this change: > > --- a/src/backend/storage/buffer/freelist.c > +++ b/src/backend/storage/buffer/freelist.c > @@ -481,10 +481,10 @@ StrategyInitialize(bool init) > * > * Since we can't tolerate running out of lookup table entries, we must be > * sure to specify an adequate table size here. The maximum steady-state > - * usage is of course NBuffers entries, but BufferAlloc() tries to insert > - * a new entry before deleting the old. In principle this could be > - * happening in each partition concurrently, so we could need as many as > - * NBuffers + NUM_BUFFER_PARTITIONS entries. > + * usage is of course NBuffers entries. But due to concurrent > + * access to numerous free lists in dynahash we can miss free entry that > + * moved between free lists. So it is better to have some spare free entries > + * to reduce probability of entry allocations after server start. > */ > InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS); > > Pre-patch, the comment claims that the maximum number of buffer > entries that can be simultaneously used is limited to NBuffers + > NUM_BUFFER_PARTITIONS, and that's why we make the hash table that > size. The idea is that we normally need more than 1 entry per buffer, > but sometimes we might have 2 entries for the same buffer if we're in > the process of changing the buffer tag, because we make the new entry > before removing the old one. To change the buffer tag, we need the > buffer mapping lock for the old partition and the new one, but if both > are the same, we need only one buffer mapping lock. That means that in > the worst case, you could have a number of processes equal to > NUM_BUFFER_PARTITIONS each in the process of changing the buffer tag > between values that both fall into the same partition, and thus each > using 2 entries. Then you could have every other buffer in use and > thus using 1 entry, for a total of NBuffers + NUM_BUFFER_PARTITIONS > entries. Now I think you're saying we go far beyond that number, and > what I wonder is how that's possible. If the system doesn't work the > way the comment says it does, maybe we ought to start by talking about > what to do about that. At the master state: - SharedBufHash is not declared as HASH_FIXED_SIZE - get_hash_entry falls back to element_alloc too fast (just if it doesn't found free entry in current freelist partition). - get_hash_entry has races. - if there are small number of spare items (and NUM_BUFFER_PARTITIONS is small number) and HASH_FIXED_SIZE is set, it becomes contended and therefore slow. HASH_REUSE solves (for shared buffers) most of this issues. Free list became rare fallback, so HASH_FIXED_SIZE for SharedBufHash doesn't lead to performance hit. And with fair number of spare items, get_hash_entry will find free entry despite its races. > I am a bit confused by your description of having done "p > SharedBufHash->hctl->allocated.value" because SharedBufHash is of type > HTAB and HTAB's hctl member is of type HASHHDR, which has no field > called "allocated". Previous letter contains links to small patches that I used for experiments. Link that adds "allocated" is https://pastebin.com/c5z0d5mz > I thought maybe my analysis here was somehow > mistaken, so I tried the debugger, which took the same view of it that > I did: > > (lldb) p SharedBufHash->hctl->allocated.value > error: <user expression 0>:1:22: no member named 'allocated' in 'HASHHDR' > SharedBufHash->hctl->allocated.value > ~~~~~~~~~~~~~~~~~~~ ^ ----- regards Yura Sokolov
pgsql-hackers by date: