Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: BufferAlloc: don't take two simultaneous locks
Date
Msg-id 3f993baebf7d5a5448a962fd3e663d707045f3f8.camel@postgrespro.ru
Whole thread Raw
In response to Re: BufferAlloc: don't take two simultaneous locks  (Yura Sokolov <y.sokolov@postgrespro.ru>)
List pgsql-hackers
В Вт, 15/03/2022 в 13:47 +0300, Yura Sokolov пишет:
> В Вт, 15/03/2022 в 16:25 +0900, Kyotaro Horiguchi пишет:
> > Thanks for the new version.
> > 
> > At Tue, 15 Mar 2022 08:07:39 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in 
> > > В Пн, 14/03/2022 в 14:57 +0300, Yura Sokolov пишет:
> > > > В Пн, 14/03/2022 в 17:12 +0900, Kyotaro Horiguchi пишет:
> > > > > At Mon, 14 Mar 2022 09:15:11 +0300, Yura Sokolov <y.sokolov@postgrespro.ru> wrote in 
> > > > > > В Пн, 14/03/2022 в 14:31 +0900, Kyotaro Horiguchi пишет:
> > > > > I tried pgbench runs with scale 100 (with 10 threads, 10 clients) on
> > > > > 128kB shared buffers and I saw that get_hash_entry never takes the
> > > > > !element_alloc() path and always allocate a fresh entry, then
> > > > > saturates at 30 new elements allocated at the medium of a 100 seconds
> > > > > run.
> > > > > 
> > > > > Then, I tried the same with the patch, and I am surprized to see that
> > > > > the rise of the number of newly allocated elements didn't stop and
> > > > > went up to 511 elements after the 100 seconds run.  So I found that my
> > > > > concern was valid.  The change in dynahash actually
> > > > > continuously/repeatedly causes lack of free list entries.  I'm not
> > > > > sure how much the impact given on performance if we change
> > > > > get_hash_entry to prefer other freelists, though.
> > > > 
> > > > Well, it is quite strange SharedBufHash is not allocated as
> > > > HASH_FIXED_SIZE. Could you check what happens with this flag set?
> > > > I'll try as well.
> > > > 
> > > > Other way to reduce observed case is to remember freelist_idx for
> > > > reused entry. I didn't believe it matters much since entries migrated
> > > > netherless, but probably due to some hot buffers there are tention to
> > > > crowd particular freelist.
> > > 
> > > Well, I did both. Everything looks ok.
> > 
> > Hmm. v8 returns stashed element with original patition index when the
> > element is *not* reused.  But what I saw in the previous test runs is
> > the REUSE->ENTER(reuse)(->REMOVE) case.  So the new version looks like
> > behaving the same way (or somehow even worse) with the previous
> > version.
> 
> v8 doesn't differ in REMOVE case neither from master nor from
> previous version. It differs in RETURNED case only.
> Or I didn't understand what you mean :(
> 
> > get_hash_entry continuously suffer lack of freelist
> > entry. (FWIW, attached are the test-output diff for both master and
> > patched)
> > 
> > master finally allocated 31 fresh elements for a 100s run.
> > 
> > > ALLOCED: 31        ;; freshly allocated
> > 
> > v8 finally borrowed 33620 times from another freelist and 0 freshly
> > allocated (ah, this version changes that..)
> > Finally v8 results in:
> > 
> > > RETURNED: 50806    ;; returned stashed elements
> > > BORROWED: 33620    ;; borrowed from another freelist
> > > REUSED: 1812664    ;; stashed
> > > ASSIGNED: 1762377  ;; reused
> > > (ALLOCED: 0)        ;; freshly allocated
> > 
> > It contains a huge degradation by frequent elog's so they cannot be
> > naively relied on, but it should show what is happening sufficiently.
> 
> Is there any measurable performance hit cause of borrowing?
> Looks like "borrowed" happened in 1.5% of time. And it is on 128kb
> shared buffers that is extremely small. (Or it was 128MB?)
> 
> Well, I think some spare entries could reduce borrowing if there is
> a need. I'll test on 128MB with spare entries. If there is profit,
> I'll return some, but will keep SharedBufHash fixed.

Well, I added GetMaxBackends spare items, but I don't see certain
profit. It is probably a bit better at 128MB shared buffers and
probably a bit worse at 1GB shared buffers (select_only on scale 100).

But it is on old Xeon X5675. Probably things will change on more
capable hardware. I just don't have access at the moment.

> 
> Master branch does less freelist manipulations since it  tries to
> insert first and if there is collision it doesn't delete victim
> buffer.
> 

-----

regards
Yura




pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Estimating HugePages Requirements?
Next
From: Nathan Bossart
Date:
Subject: Re: Estimating HugePages Requirements?