Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers

From Robert Haas
Subject Re: BufferAlloc: don't take two simultaneous locks
Date
Msg-id CA+TgmobhUv-zotvjpZK2u-nz1H2W9PdR3BSpfjpWK=ugj4LRyA@mail.gmail.com
Whole thread Raw
In response to Re: BufferAlloc: don't take two simultaneous locks  (Yura Sokolov <y.sokolov@postgrespro.ru>)
Responses Re: BufferAlloc: don't take two simultaneous locks  (Yura Sokolov <y.sokolov@postgrespro.ru>)
List pgsql-hackers
On Thu, Apr 21, 2022 at 5:04 AM Yura Sokolov <y.sokolov@postgrespro.ru> wrote:
>   $ pid=`ps x | awk '/checkpointer/ && !/awk/ { print $1 }'`
>   $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value'
>
>   $1 = 16512
>
>   $ install/bin/pgbench -c 600 -j 800 -T 10 -P 1 -S -M prepared postgres
>   ...
>   $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value'
>
>   $1 = 20439
>
>   $ install/bin/pgbench -c 600 -j 800 -T 10 -P 1 -S -M prepared postgres
>   ...
>   $ gdb -p $pid -batch -ex 'p SharedBufHash->hctl->allocated.value'
>
>   $1 = 20541
>
> It stabilizes at 20541

Hmm. So is the existing comment incorrect? Remember, I was complaining
about this change:

--- a/src/backend/storage/buffer/freelist.c
+++ b/src/backend/storage/buffer/freelist.c
@@ -481,10 +481,10 @@ StrategyInitialize(bool init)
  *
  * Since we can't tolerate running out of lookup table entries, we must be
  * sure to specify an adequate table size here.  The maximum steady-state
- * usage is of course NBuffers entries, but BufferAlloc() tries to insert
- * a new entry before deleting the old.  In principle this could be
- * happening in each partition concurrently, so we could need as many as
- * NBuffers + NUM_BUFFER_PARTITIONS entries.
+ * usage is of course NBuffers entries. But due to concurrent
+ * access to numerous free lists in dynahash we can miss free entry that
+ * moved between free lists. So it is better to have some spare free entries
+ * to reduce probability of entry allocations after server start.
  */
  InitBufTable(NBuffers + NUM_BUFFER_PARTITIONS);

Pre-patch, the comment claims that the maximum number of buffer
entries that can be simultaneously used is limited to NBuffers +
NUM_BUFFER_PARTITIONS, and that's why we make the hash table that
size. The idea is that we normally need more than 1 entry per buffer,
but sometimes we might have 2 entries for the same buffer if we're in
the process of changing the buffer tag, because we make the new entry
before removing the old one. To change the buffer tag, we need the
buffer mapping lock for the old partition and the new one, but if both
are the same, we need only one buffer mapping lock. That means that in
the worst case, you could have a number of processes equal to
NUM_BUFFER_PARTITIONS each in the process of changing the buffer tag
between values that both fall into the same partition, and thus each
using 2 entries. Then you could have every other buffer in use and
thus using 1 entry, for a total of NBuffers + NUM_BUFFER_PARTITIONS
entries. Now I think you're saying we go far beyond that number, and
what I wonder is how that's possible. If the system doesn't work the
way the comment says it does, maybe we ought to start by talking about
what to do about that.

I am a bit confused by your description of having done "p
SharedBufHash->hctl->allocated.value" because SharedBufHash is of type
HTAB and HTAB's hctl member is of type HASHHDR, which has no field
called "allocated". I thought maybe my analysis here was somehow
mistaken, so I tried the debugger, which took the same view of it that
I did:

(lldb) p SharedBufHash->hctl->allocated.value
error: <user expression 0>:1:22: no member named 'allocated' in 'HASHHDR'
SharedBufHash->hctl->allocated.value
~~~~~~~~~~~~~~~~~~~  ^

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)
Next
From: Peter Eisentraut
Date:
Subject: Re: Replace open mode with PG_BINARY_R/W/A macros