Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: BufferAlloc: don't take two simultaneous locks
Date
Msg-id 20220315.162545.2224728499612456194.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: BufferAlloc: don't take two simultaneous locks  (Yura Sokolov <y.sokolov@postgrespro.ru>)
Responses Re: BufferAlloc: don't take two simultaneous locks  (Yura Sokolov <y.sokolov@postgrespro.ru>)
List pgsql-hackers
Thanks for the new version.

At Tue, 15 Mar 2022 08:07:39 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in
> В Пн, 14/03/2022 в 14:57 +0300, Yura Sokolov пишет:
> > В Пн, 14/03/2022 в 17:12 +0900, Kyotaro Horiguchi пишет:
> > > At Mon, 14 Mar 2022 09:15:11 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in
> > > > В Пн, 14/03/2022 в 14:31 +0900, Kyotaro Horiguchi пишет:
> > > I tried pgbench runs with scale 100 (with 10 threads, 10 clients) on
> > > 128kB shared buffers and I saw that get_hash_entry never takes the
> > > !element_alloc() path and always allocate a fresh entry, then
> > > saturates at 30 new elements allocated at the medium of a 100 seconds
> > > run.
> > >
> > > Then, I tried the same with the patch, and I am surprized to see that
> > > the rise of the number of newly allocated elements didn't stop and
> > > went up to 511 elements after the 100 seconds run.  So I found that my
> > > concern was valid.  The change in dynahash actually
> > > continuously/repeatedly causes lack of free list entries.  I'm not
> > > sure how much the impact given on performance if we change
> > > get_hash_entry to prefer other freelists, though.
> >
> > Well, it is quite strange SharedBufHash is not allocated as
> > HASH_FIXED_SIZE. Could you check what happens with this flag set?
> > I'll try as well.
> >
> > Other way to reduce observed case is to remember freelist_idx for
> > reused entry. I didn't believe it matters much since entries migrated
> > netherless, but probably due to some hot buffers there are tention to
> > crowd particular freelist.
>
> Well, I did both. Everything looks ok.

Hmm. v8 returns stashed element with original patition index when the
element is *not* reused.  But what I saw in the previous test runs is
the REUSE->ENTER(reuse)(->REMOVE) case.  So the new version looks like
behaving the same way (or somehow even worse) with the previous
version.  get_hash_entry continuously suffer lack of freelist
entry. (FWIW, attached are the test-output diff for both master and
patched)

master finally allocated 31 fresh elements for a 100s run.

> ALLOCED: 31        ;; freshly allocated

v8 finally borrowed 33620 times from another freelist and 0 freshly
allocated (ah, this version changes that..)
Finally v8 results in:

> RETURNED: 50806    ;; returned stashed elements
> BORROWED: 33620    ;; borrowed from another freelist
> REUSED: 1812664    ;; stashed
> ASSIGNED: 1762377  ;; reused
>(ALLOCED: 0)        ;; freshly allocated

It contains a huge degradation by frequent elog's so they cannot be
naively relied on, but it should show what is happening sufficiently.

> I lost access to Xeon 8354H, so returned to old Xeon X5675.
...
> Strange thing: both master and patched version has higher
> peak tps at X5676 at medium connections (17 or 27 clients)
> than in first october version [1]. But lower tps at higher
> connections number (>= 191 clients).
> I'll try to bisect on master this unfortunate change.

The reversing of the preference order between freshly-allocation and
borrow-from-another-freelist might affect.


regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/storage/buffer/buf_table.c b/src/backend/storage/buffer/buf_table.c
index dc439940fa..ac651b98e6 100644
--- a/src/backend/storage/buffer/buf_table.c
+++ b/src/backend/storage/buffer/buf_table.c
@@ -31,7 +31,7 @@ typedef struct
     int            id;                /* Associated buffer ID */
 } BufferLookupEnt;
 
-static HTAB *SharedBufHash;
+HTAB *SharedBufHash;
 
 
 /*
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 3babde8d70..294516ef01 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -195,6 +195,11 @@ struct HASHHDR
     long        ssize;            /* segment size --- must be power of 2 */
     int            sshift;            /* segment shift = log2(ssize) */
     int            nelem_alloc;    /* number of entries to allocate at once */
+    int alloc;
+    int reuse;
+    int borrow;
+    int assign;
+    int ret;
 
 #ifdef HASH_STATISTICS
 
@@ -963,6 +968,7 @@ hash_search(HTAB *hashp,
                                        foundPtr);
 }
 
+extern HTAB *SharedBufHash;
 void *
 hash_search_with_hash_value(HTAB *hashp,
                             const void *keyPtr,
@@ -1354,6 +1360,8 @@ get_hash_entry(HTAB *hashp, int freelist_idx)
                     hctl->freeList[freelist_idx].nentries++;
                     SpinLockRelease(&hctl->freeList[freelist_idx].mutex);
 
+                    if (hashp == SharedBufHash)
+                        elog(LOG, "BORROWED: %d", ++hctl->borrow);
                     return newElement;
                 }
 
@@ -1363,6 +1371,8 @@ get_hash_entry(HTAB *hashp, int freelist_idx)
             /* no elements available to borrow either, so out of memory */
             return NULL;
         }
+        else if (hashp == SharedBufHash)
+            elog(LOG, "ALLOCED: %d", ++hctl->alloc);
     }
 
     /* remove entry from freelist, bump nentries */
diff --git a/src/backend/storage/buffer/buf_table.c b/src/backend/storage/buffer/buf_table.c
index 55bb491ad0..029bb89f26 100644
--- a/src/backend/storage/buffer/buf_table.c
+++ b/src/backend/storage/buffer/buf_table.c
@@ -31,7 +31,7 @@ typedef struct
     int            id;                /* Associated buffer ID */
 } BufferLookupEnt;
 
-static HTAB *SharedBufHash;
+HTAB *SharedBufHash;
 
 
 /*
diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c
index 50c0e47643..00159714d1 100644
--- a/src/backend/utils/hash/dynahash.c
+++ b/src/backend/utils/hash/dynahash.c
@@ -199,6 +199,11 @@ struct HASHHDR
     int            nelem_alloc;    /* number of entries to allocate at once */
     nalloced_t    nalloced;        /* number of entries allocated */
 
+    int alloc;
+    int reuse;
+    int borrow;
+    int assign;
+    int ret;
 #ifdef HASH_STATISTICS
 
     /*
@@ -1006,6 +1011,7 @@ hash_search(HTAB *hashp,
                                        foundPtr);
 }
 
+extern HTAB *SharedBufHash;
 void *
 hash_search_with_hash_value(HTAB *hashp,
                             const void *keyPtr,
@@ -1143,6 +1149,8 @@ hash_search_with_hash_value(HTAB *hashp,
                 DynaHashReuse.hashp = hashp;
                 DynaHashReuse.freelist_idx = freelist_idx;
 
+                if (hashp == SharedBufHash)
+                    elog(LOG, "REUSED: %d", ++hctl->reuse);
                 /* Caller should call HASH_ASSIGN as the very next step. */
                 return (void *) ELEMENTKEY(currBucket);
             }
@@ -1160,6 +1168,9 @@ hash_search_with_hash_value(HTAB *hashp,
                 if (likely(DynaHashReuse.element == NULL))
                     return (void *) ELEMENTKEY(currBucket);
 
+                if (hashp == SharedBufHash)
+                    elog(LOG, "RETURNED: %d", ++hctl->ret);
+
                 freelist_idx = DynaHashReuse.freelist_idx;
                 /* if partitioned, must lock to touch nfree and freeList */
                 if (IS_PARTITIONED(hctl))
@@ -1191,6 +1202,13 @@ hash_search_with_hash_value(HTAB *hashp,
             }
             else
             {
+                if (hashp == SharedBufHash)
+                {
+                    hctl->assign++;
+                    elog(LOG, "ASSIGNED: %d (%d)",
+                         hctl->assign, hctl->reuse - hctl->assign);
+                }
+                    
                 currBucket = DynaHashReuse.element;
                 DynaHashReuse.element = NULL;
                 DynaHashReuse.hashp = NULL;
@@ -1448,6 +1466,8 @@ get_hash_entry(HTAB *hashp, int freelist_idx)
                     hctl->freeList[borrow_from_idx].nfree--;
                     SpinLockRelease(&(hctl->freeList[borrow_from_idx].mutex));
 
+                    if (hashp == SharedBufHash)
+                        elog(LOG, "BORROWED: %d", ++hctl->borrow);
                     return newElement;
                 }
 
@@ -1457,6 +1477,10 @@ get_hash_entry(HTAB *hashp, int freelist_idx)
             /* no elements available to borrow either, so out of memory */
             return NULL;
         }
+        else if (hashp == SharedBufHash)
+            elog(LOG, "ALLOCED: %d", ++hctl->alloc);
+
+            
     }
 
     /* remove entry from freelist, decrease nfree */

pgsql-hackers by date:

Previous
From: Prabhat Sahu
Date:
Subject: Can we consider "24 Hours" for "next day" in INTERVAL datatype ?
Next
From: Bharath Rupireddy
Date:
Subject: Re: Allow async standbys wait for sync replication