Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: BufferAlloc: don't take two simultaneous locks |
Date | |
Msg-id | 20220316.120747.974686338659732838.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: BufferAlloc: don't take two simultaneous locks (Yura Sokolov <y.sokolov@postgrespro.ru>) |
Responses |
Re: BufferAlloc: don't take two simultaneous locks
|
List | pgsql-hackers |
At Tue, 15 Mar 2022 13:47:17 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in > В Вт, 15/03/2022 в 16:25 +0900, Kyotaro Horiguchi пишет: > > Hmm. v8 returns stashed element with original patition index when the > > element is *not* reused. But what I saw in the previous test runs is > > the REUSE->ENTER(reuse)(->REMOVE) case. So the new version looks like > > behaving the same way (or somehow even worse) with the previous > > version. > > v8 doesn't differ in REMOVE case neither from master nor from > previous version. It differs in RETURNED case only. > Or I didn't understand what you mean :( In v7, HASH_ENTER returns the element stored in DynaHashReuse using the freelist_idx of the new key. v8 uses that of the old key (at the time of HASH_REUSE). So in the case "REUSE->ENTER(elem exists and returns the stashed)" case the stashed element is returned to its original partition. But it is not what I mentioned. On the other hand, once the stahsed element is reused by HASH_ENTER, it gives the same resulting state with HASH_REMOVE->HASH_ENTER(borrow from old partition) case. I suspect that ththat the frequent freelist starvation comes from the latter case. > > get_hash_entry continuously suffer lack of freelist > > entry. (FWIW, attached are the test-output diff for both master and > > patched) > > > > master finally allocated 31 fresh elements for a 100s run. > > > > > ALLOCED: 31 ;; freshly allocated > > > > v8 finally borrowed 33620 times from another freelist and 0 freshly > > allocated (ah, this version changes that..) > > Finally v8 results in: > > > > > RETURNED: 50806 ;; returned stashed elements > > > BORROWED: 33620 ;; borrowed from another freelist > > > REUSED: 1812664 ;; stashed > > > ASSIGNED: 1762377 ;; reused > > >(ALLOCED: 0) ;; freshly allocated (I misunderstand that v8 modified get_hash_entry's preference between allocation and borrowing.) I re-ran the same check for v7 and it showed different result. RETURNED: 1 ALLOCED: 15 BORROWED: 0 REUSED: 505435 ASSIGNED: 505462 (-27) ## the counters are not locked. > Is there any measurable performance hit cause of borrowing? > Looks like "borrowed" happened in 1.5% of time. And it is on 128kb > shared buffers that is extremely small. (Or it was 128MB?) It is intentional set small to get extremely frequent buffer replacements. The point here was the patch actually can induce frequent freelist starvation. And as you do, I also doubt the significance of the performance hit by that. Just I was not usre. I re-ran the same for v8 and got a result largely different from the previous trial on the same v8. RETURNED: 2 ALLOCED: 0 BORROWED: 435 REUSED: 495444 ASSIGNED: 495467 (-23) Now "BORROWED" happens 0.8% of REUSED. > Well, I think some spare entries could reduce borrowing if there is > a need. I'll test on 128MB with spare entries. If there is profit, > I'll return some, but will keep SharedBufHash fixed. I don't doubt the benefit of this patch. And now convinced by myself that the downside is negligible than the benefit. > Master branch does less freelist manipulations since it tries to > insert first and if there is collision it doesn't delete victim > buffer. > > > > I lost access to Xeon 8354H, so returned to old Xeon X5675. > > ... > > > Strange thing: both master and patched version has higher > > > peak tps at X5676 at medium connections (17 or 27 clients) > > > than in first october version [1]. But lower tps at higher > > > connections number (>= 191 clients). > > > I'll try to bisect on master this unfortunate change. > > > > The reversing of the preference order between freshly-allocation and > > borrow-from-another-freelist might affect. > > `master` changed its behaviour as well. > It is not problem of the patch at all. Agreed. So I think we should go on this direction. There are some last comments on v8. + HASH_FIXED_SIZE); Ah, now I understand that this prevented allocation of new elements. I think this good to do for SharedBufHash. ==== + long nfree; /* number of free entries in the list */ HASHELEMENT *freeList; /* chain of free elements */ } FreeListData; +#if SIZEOF_LONG == 4 +typedef pg_atomic_uint32 nalloced_store_t; +typedef uint32 nalloced_value_t; +#define nalloced_read(a) (long)pg_atomic_read_u32(a) +#define nalloced_add(a, v) pg_atomic_fetch_add_u32((a), (uint32)(v)) ==== I don't think nalloced needs to be the same width to long. For the platforms with 32-bit long, anyway the possible degradation if any by 64-bit atomic there doesn't matter. So don't we always define the atomic as 64bit and use the pg_atomic_* functions directly? + case HASH_REUSE: + if (currBucket != NULL) Don't we need an assertion on (DunaHashReuse.element == NULL) here? - size = add_size(size, BufTableShmemSize(NBuffers + NUM_BUFFER_PARTITIONS)); + /* size of lookup hash table */ + size = add_size(size, BufTableShmemSize(NBuffers)); I was not sure that this is safe, but actually I didn't get "out of shared memory". On second thought, I realized that when a dynahash entry is stashed, BufferAlloc always holding a buffer block, too. So now I'm sure that this is safe. That's all. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: