Hi,
On 2024-10-30 10:47:35 -0700, Jeff Davis wrote:
> On Tue, 2024-09-24 at 11:55 -0400, Andres Freund wrote:
> > What I suspect we might want instead is something inbetween a share
> > and an
> > exclusive lock, which is taken while setting a hint bit and which
> > conflicts
> > with having an IO in progress.
>
> I am starting to wonder if a shared content locks are really the right
> concept at all. It makes sense for simple mutexes, but we are already
> more complex than that, and you are suggesting adding on to that
> complexity.
What I am proposing isn't making the content lock more complicated, it's
orthogonal to the content lock.
> Which I agree is a good idea, I'm just wondering if we could go even
> further.
>
> The README states that a shared lock is necessary for visibility
> checking, but can we just be more careful with the ordering and
> atomicity of visibility changes in the page?
>
> * carefully order reads and writes of xmin/xmax/hints (would
> that create too many read barriers in the tqual.c code?)
> * write line pointer after tuple is written
It's possible, but it'd be a lot of work. And you wouldn't need to just do
this for heap, but all the indexes too, to make progress on the
don't-set-hint-bits-during-io front. So I don't think it makes sense to tie
these things together.
I do think that it's an argument for not importing all the complexity into
lwlock.c though.
> We would still have pins and cleanup locks to prevent data removal.
As-is cleanup locks only work in coordination with content locks. While
cleanup is ongoing we need to prevent anybody from starting to look at the
page - without acquiring something like a shared lock that's not easy.
> We'd have the logic you suggest that would prevent modification during
> IO. And there would still need to be an exclusive content locks so that
> two inserts don't try to allocate the same line pointer, or lock the
> same tuple.
>
> If PD_ALL_VISIBLE is set it's even simpler.
>
> Am I missing some major hazards?
I don't think anything fundamental, but it's a decidedly nontrivial amount of
work.
Greetings,
Andres Freund