Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: BufferAlloc: don't take two simultaneous locks |
Date | |
Msg-id | 20220315.162545.2224728499612456194.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: BufferAlloc: don't take two simultaneous locks (Yura Sokolov <y.sokolov@postgrespro.ru>) |
Responses |
Re: BufferAlloc: don't take two simultaneous locks
|
List | pgsql-hackers |
Thanks for the new version. At Tue, 15 Mar 2022 08:07:39 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in > В Пн, 14/03/2022 в 14:57 +0300, Yura Sokolov пишет: > > В Пн, 14/03/2022 в 17:12 +0900, Kyotaro Horiguchi пишет: > > > At Mon, 14 Mar 2022 09:15:11 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in > > > > В Пн, 14/03/2022 в 14:31 +0900, Kyotaro Horiguchi пишет: > > > I tried pgbench runs with scale 100 (with 10 threads, 10 clients) on > > > 128kB shared buffers and I saw that get_hash_entry never takes the > > > !element_alloc() path and always allocate a fresh entry, then > > > saturates at 30 new elements allocated at the medium of a 100 seconds > > > run. > > > > > > Then, I tried the same with the patch, and I am surprized to see that > > > the rise of the number of newly allocated elements didn't stop and > > > went up to 511 elements after the 100 seconds run. So I found that my > > > concern was valid. The change in dynahash actually > > > continuously/repeatedly causes lack of free list entries. I'm not > > > sure how much the impact given on performance if we change > > > get_hash_entry to prefer other freelists, though. > > > > Well, it is quite strange SharedBufHash is not allocated as > > HASH_FIXED_SIZE. Could you check what happens with this flag set? > > I'll try as well. > > > > Other way to reduce observed case is to remember freelist_idx for > > reused entry. I didn't believe it matters much since entries migrated > > netherless, but probably due to some hot buffers there are tention to > > crowd particular freelist. > > Well, I did both. Everything looks ok. Hmm. v8 returns stashed element with original patition index when the element is *not* reused. But what I saw in the previous test runs is the REUSE->ENTER(reuse)(->REMOVE) case. So the new version looks like behaving the same way (or somehow even worse) with the previous version. get_hash_entry continuously suffer lack of freelist entry. (FWIW, attached are the test-output diff for both master and patched) master finally allocated 31 fresh elements for a 100s run. > ALLOCED: 31 ;; freshly allocated v8 finally borrowed 33620 times from another freelist and 0 freshly allocated (ah, this version changes that..) Finally v8 results in: > RETURNED: 50806 ;; returned stashed elements > BORROWED: 33620 ;; borrowed from another freelist > REUSED: 1812664 ;; stashed > ASSIGNED: 1762377 ;; reused >(ALLOCED: 0) ;; freshly allocated It contains a huge degradation by frequent elog's so they cannot be naively relied on, but it should show what is happening sufficiently. > I lost access to Xeon 8354H, so returned to old Xeon X5675. ... > Strange thing: both master and patched version has higher > peak tps at X5676 at medium connections (17 or 27 clients) > than in first october version [1]. But lower tps at higher > connections number (>= 191 clients). > I'll try to bisect on master this unfortunate change. The reversing of the preference order between freshly-allocation and borrow-from-another-freelist might affect. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/storage/buffer/buf_table.c b/src/backend/storage/buffer/buf_table.c index dc439940fa..ac651b98e6 100644 --- a/src/backend/storage/buffer/buf_table.c +++ b/src/backend/storage/buffer/buf_table.c @@ -31,7 +31,7 @@ typedef struct int id; /* Associated buffer ID */ } BufferLookupEnt; -static HTAB *SharedBufHash; +HTAB *SharedBufHash; /* diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c index 3babde8d70..294516ef01 100644 --- a/src/backend/utils/hash/dynahash.c +++ b/src/backend/utils/hash/dynahash.c @@ -195,6 +195,11 @@ struct HASHHDR long ssize; /* segment size --- must be power of 2 */ int sshift; /* segment shift = log2(ssize) */ int nelem_alloc; /* number of entries to allocate at once */ + int alloc; + int reuse; + int borrow; + int assign; + int ret; #ifdef HASH_STATISTICS @@ -963,6 +968,7 @@ hash_search(HTAB *hashp, foundPtr); } +extern HTAB *SharedBufHash; void * hash_search_with_hash_value(HTAB *hashp, const void *keyPtr, @@ -1354,6 +1360,8 @@ get_hash_entry(HTAB *hashp, int freelist_idx) hctl->freeList[freelist_idx].nentries++; SpinLockRelease(&hctl->freeList[freelist_idx].mutex); + if (hashp == SharedBufHash) + elog(LOG, "BORROWED: %d", ++hctl->borrow); return newElement; } @@ -1363,6 +1371,8 @@ get_hash_entry(HTAB *hashp, int freelist_idx) /* no elements available to borrow either, so out of memory */ return NULL; } + else if (hashp == SharedBufHash) + elog(LOG, "ALLOCED: %d", ++hctl->alloc); } /* remove entry from freelist, bump nentries */ diff --git a/src/backend/storage/buffer/buf_table.c b/src/backend/storage/buffer/buf_table.c index 55bb491ad0..029bb89f26 100644 --- a/src/backend/storage/buffer/buf_table.c +++ b/src/backend/storage/buffer/buf_table.c @@ -31,7 +31,7 @@ typedef struct int id; /* Associated buffer ID */ } BufferLookupEnt; -static HTAB *SharedBufHash; +HTAB *SharedBufHash; /* diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c index 50c0e47643..00159714d1 100644 --- a/src/backend/utils/hash/dynahash.c +++ b/src/backend/utils/hash/dynahash.c @@ -199,6 +199,11 @@ struct HASHHDR int nelem_alloc; /* number of entries to allocate at once */ nalloced_t nalloced; /* number of entries allocated */ + int alloc; + int reuse; + int borrow; + int assign; + int ret; #ifdef HASH_STATISTICS /* @@ -1006,6 +1011,7 @@ hash_search(HTAB *hashp, foundPtr); } +extern HTAB *SharedBufHash; void * hash_search_with_hash_value(HTAB *hashp, const void *keyPtr, @@ -1143,6 +1149,8 @@ hash_search_with_hash_value(HTAB *hashp, DynaHashReuse.hashp = hashp; DynaHashReuse.freelist_idx = freelist_idx; + if (hashp == SharedBufHash) + elog(LOG, "REUSED: %d", ++hctl->reuse); /* Caller should call HASH_ASSIGN as the very next step. */ return (void *) ELEMENTKEY(currBucket); } @@ -1160,6 +1168,9 @@ hash_search_with_hash_value(HTAB *hashp, if (likely(DynaHashReuse.element == NULL)) return (void *) ELEMENTKEY(currBucket); + if (hashp == SharedBufHash) + elog(LOG, "RETURNED: %d", ++hctl->ret); + freelist_idx = DynaHashReuse.freelist_idx; /* if partitioned, must lock to touch nfree and freeList */ if (IS_PARTITIONED(hctl)) @@ -1191,6 +1202,13 @@ hash_search_with_hash_value(HTAB *hashp, } else { + if (hashp == SharedBufHash) + { + hctl->assign++; + elog(LOG, "ASSIGNED: %d (%d)", + hctl->assign, hctl->reuse - hctl->assign); + } + currBucket = DynaHashReuse.element; DynaHashReuse.element = NULL; DynaHashReuse.hashp = NULL; @@ -1448,6 +1466,8 @@ get_hash_entry(HTAB *hashp, int freelist_idx) hctl->freeList[borrow_from_idx].nfree--; SpinLockRelease(&(hctl->freeList[borrow_from_idx].mutex)); + if (hashp == SharedBufHash) + elog(LOG, "BORROWED: %d", ++hctl->borrow); return newElement; } @@ -1457,6 +1477,10 @@ get_hash_entry(HTAB *hashp, int freelist_idx) /* no elements available to borrow either, so out of memory */ return NULL; } + else if (hashp == SharedBufHash) + elog(LOG, "ALLOCED: %d", ++hctl->alloc); + + } /* remove entry from freelist, decrease nfree */
pgsql-hackers by date: