Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers

From Yura Sokolov
Subject Re: BufferAlloc: don't take two simultaneous locks
Date
Msg-id 75fe5128f2550bf0d198ed222f1abd6f921aec77.camel@postgrespro.ru
Whole thread Raw
In response to Re: BufferAlloc: don't take two simultaneous locks  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: BufferAlloc: don't take two simultaneous locks  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
В Пн, 14/03/2022 в 14:31 +0900, Kyotaro Horiguchi пишет:
> At Mon, 14 Mar 2022 09:39:48 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > I'll examine the possibility to resolve this...
> 
> The existence of nfree and nalloc made me confused and I found the
> reason.
> 
> In the case where a parittion collects many REUSE-ASSIGN-REMOVEed
> elemetns from other paritiotns, nfree gets larger than nalloced.  This
> is a strange point of the two counters.  nalloced is only referred to
> as (sum(nalloced[])).  So we don't need nalloced per-partition basis
> and the formula to calculate the number of used elements would be as
> follows.
> 
>  sum(nalloced - nfree)
>  = <total_nalloced> - sum(nfree)
> 
> We rarely create fresh elements in shared hashes so I don't think
> there's additional contention on the <total_nalloced> even if it were
> a global atomic.
> 
> So, the remaining issue is the possible imbalancement among
> partitions.  On second thought, by the current way, if there's a bad
> deviation in partition-usage, a heavily hit partition finally collects
> elements via get_hash_entry().  By the patch's way, similar thing
> happens via the REUSE-ASSIGN-REMOVE sequence. But buffers once used
> for something won't be freed until buffer invalidation. But bulk
> buffer invalidation won't deviatedly distribute freed buffers among
> partitions.  So I conclude for now that is a non-issue.
> 
> So my opinion on the counters is:
> 
> I'd like to ask you to remove nalloced from partitions then add a
> global atomic for the same use?

I really believe it should be global. I made it per-partition to
not overcomplicate first versions. Glad you tell it.

I thought to protect it with freeList[0].mutex, but probably atomic
is better idea here. But which atomic to chose: uint64 or uint32?
Based on sizeof(long)?
Ok, I'll do in next version.

Whole get_hash_entry look strange.
Doesn't it better to cycle through partitions and only then go to
get_hash_entry?
May be there should be bitmap for non-empty free lists? 32bit for
32 partitions. But wouldn't bitmap became contention point itself?

> No need to do something for the possible deviation issue.

-------

regards
Yura Sokolov




pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: BufferAlloc: don't take two simultaneous locks
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: [PATCH] Accept IP addresses in server certificate SANs