Re: BufferAlloc: don't take two simultaneous locks - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: BufferAlloc: don't take two simultaneous locks
Date
Msg-id 20220419.104515.841299122302516369.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: BufferAlloc: don't take two simultaneous locks  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
At Mon, 18 Apr 2022 09:53:42 -0400, Robert Haas <robertmhaas@gmail.com> wrote in 
> On Fri, Apr 15, 2022 at 4:29 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > The patch removes buftable entry frist then either inserted again or
> > returned to freelist.  I don't understand how it can be in both
> > buftable and freelist..  What kind of trouble do you have in mind for
> > example?
> 
> I'm not sure. I'm just thinking about potential dangers. I was more
> worried about it ending up in neither place.

I think that is more likely to happen.  But I think that that can
happen also by the current code if it had exits on the way. And the
patch does not add a new exit.

> > So, does this get progressed if someone (maybe Yura?) runs a
> > benchmarking with this method?
> 
> I think we're talking about theoretical concerns about safety here,
> and you can't resolve that by benchmarking. Tom or others may have a

Yeah.. I didn't mean that benchmarking resolves the concerns.  I meant
that if benchmarking shows that the safer (or cleaner) way give
sufficient gain, we can take that direction.

> different view, but IMHO the issue with this patch isn't that there
> are no performance benefits, but that the patch needs to be fully
> safe. He and I may disagree on how likely it is that it can be made
> safe, but it can be a million times faster and if it's not safe it's
> still dead.

Right.

> Something clearly needs to be done to plug the specific problem that I
> mentioned earlier, somehow making it so we never need to grow the hash
> table at runtime. If anyone can think of other such hazards those also
> need to be fixed.

- Running out of buffer mapping entries?

It seems to me related to "runtime growth of the table mapping hash
table". Does the runtime growth of the hash mean that get_hash_entry
may call element_alloc even if the hash is created with a sufficient
number of elements?  If so, it's not the fault of this patch.  We can
search all freelists before asking element_alloc() (maybe) in exchange
of some extent of potential temporary degradation.  That being said, I
don't think it's good that we call element_alloc for shared hashes
after creation.

- Is the collision handling correct that just returning the victimized
  buffer to freelist?

Potentially the patch can over-vicitimzie buffers up to
max_connections-1.  Is this what you are concerned about?  A way to
preveint over-victimization was raised upthread, that is, we insert a
special table mapping entry that signals "this page is going to be
available soon." before releasing newPartitionLock. This prevents
over-vicitimaztion.

- Doesn't buffer-leak or duplicate mapping happen?

This patch does not change the order of the required steps, and
there's no exit on the way (if the current code doesn't have.). No two
processes victimize the same buffer since the victimizing steps are
protected by oldPartitionLock (and header lock) same as the current
code, and no two processes insert different buffers for the same page
since the inserting steps are protected by newPartitionLock. No
vicitmized buffer gets orphaned *if* that doesn't happen by the
current code.  So *I* am at loss how *I* can make it clear that they
don't happenX( (Of course Yura might think differently.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Logical replication timeout problem
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Add --{no-,}bypassrls flags to createuser