Re: AIO writes vs hint bits vs checksums - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO writes vs hint bits vs checksums
Date
Msg-id yvcwpovzlfmmrnwlnlwb3g26fcogluolgjrqarotwxxxi4z2o4@5vx7p3lczkpf
Whole thread Raw
In response to Re: AIO writes vs hint bits vs checksums  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
Hi,

On 2024-10-30 10:47:35 -0700, Jeff Davis wrote:
> On Tue, 2024-09-24 at 11:55 -0400, Andres Freund wrote:
> > What I suspect we might want instead is something inbetween a share
> > and an
> > exclusive lock, which is taken while setting a hint bit and which
> > conflicts
> > with having an IO in progress.
> 
> I am starting to wonder if a shared content locks are really the right
> concept at all. It makes sense for simple mutexes, but we are already
> more complex than that, and you are suggesting adding on to that
> complexity.

What I am proposing isn't making the content lock more complicated, it's
orthogonal to the content lock.


> Which I agree is a good idea, I'm just wondering if we could go even
> further.
> 
> The README states that a shared lock is necessary for visibility
> checking, but can we just be more careful with the ordering and
> atomicity of visibility changes in the page?
> 
>  * carefully order reads and writes of xmin/xmax/hints (would
>    that create too many read barriers in the tqual.c code?)
>  * write line pointer after tuple is written

It's possible, but it'd be a lot of work.  And you wouldn't need to just do
this for heap, but all the indexes too, to make progress on the
don't-set-hint-bits-during-io front.  So I don't think it makes sense to tie
these things together.

I do think that it's an argument for not importing all the complexity into
lwlock.c though.


> We would still have pins and cleanup locks to prevent data removal.

As-is cleanup locks only work in coordination with content locks. While
cleanup is ongoing we need to prevent anybody from starting to look at the
page - without acquiring something like a shared lock that's not easy.


> We'd have the logic you suggest that would prevent modification during
> IO. And there would still need to be an exclusive content locks so that
> two inserts don't try to allocate the same line pointer, or lock the
> same tuple.
> 
> If PD_ALL_VISIBLE is set it's even simpler.
> 
> Am I missing some major hazards?

I don't think anything fundamental, but it's a decidedly nontrivial amount of
work.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Jesper Pedersen
Date:
Subject: Re: protocol-level wait-for-LSN
Next
From: Пополитов Владлен
Date:
Subject: Re: [PATCH] Add array_reverse() function