Re: AIO v2.5 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: AIO v2.5
Date
Msg-id ftg76ptgts4h353aztjcdoq2kc2ournuxqeypd4t5aovbm4mxz@qur2qrt2k3oy
Whole thread Raw
In response to Re: AIO v2.5  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
Hi,


On 2025-04-05 06:43:52 -0700, Noah Misch wrote:
> Yeah.  Maybe this (untested):

Something like that works. I adopted your formulation of this, mine was in
GetLocalVictimBuffer(), which seems slightly less future proof.


> > > If that's right, it would still be nice to reach the right
> > > VALGRIND_MAKE_MEM_DEFINED() without involving bufmgr.
> >
> > I think that would be possible if we didn't do VALGRIND_MAKE_MEM_NOACCESS() in
> > UnpinBuffer()/UnpinLocalBuffer(). But with that I don't see how we can avoid
> > needing to remark the region as accessible?
>
> Yes, it's not that we should remove VALGRIND_MAKE_MEM_DEFINED() from bufmgr.
> I was trying to think about future AIO callers (e.g. RelationCopyStorage())
> and how they'd want things to work.

Ah, I now understand what you mean.  I'm inclined to leave that out for now,
we can do that later. I spent a bit of time experimenting with going bigger,
but I think it's important to get skink a bit less red.


> That said, perhaps we should just omit the io_uring-level Valgrind calls and
> delegate the problem to higher layers until there's a concrete use case:

Yes, I think that's the right answer for now. Applied your suggested comment.


I looked for a good place to add a comment to method_io_uring.c, but couldn't
really come up with anything convincing.  Then I thought the
 * "Start" routines for individual IO operations
comment in aio_io.c might be a good place, but also failed to come up with
anything particularly convincing.


> > > - A complete_local callback solves those problems.  However, if the
> > >   AIO-defining subxact aborted, then we shouldn't set DEFINED at all, since
> > >   the buffer mapping may have changed by the time of complete_local.
> >
> > I don't think that is possible, due to the aio subsystem owned pin?
>
> We currently drop the shared buffer AIO subsystem pin in complete_shared
> (buffer_readv_complete_one() -> TerminateBufferIO()).  I was trying to say
> that if we put a VALGRIND_MAKE_MEM_DEFINED in complete_local without changing
> anything else, it would have this problem.

Ah, yes, I see what you mean.


> We probably wouldn't want to move the shared buffer pin drop to
> complete_local without a strong reason.

Agreed.


> > I think the point about non-aio uses is a fair one, but I don't quite know how
> > to best solve it right now, due to the local buffer issue you mentioned. I'd
> > guess that we'd best put it somewhere
> > a) in pgaio_io_process_completion(), if definer==completor || !PGAIO_HF_REFERENCES_LOCAL
> > b) pgaio_io_call_complete_local(), just before calling
> >    pgaio_io_call_complete_local() if PGAIO_HF_REFERENCES_LOCAL
>
> I think that would do nothing wrong today, but it uses
> !PGAIO_HF_REFERENCES_LOCAL as a proxy for "target memory is already DEFINED,
> or a higher layer will make it DEFINED".  While !PGAIO_HF_REFERENCES_LOCAL
> does imply that today, I think that's a bufmgr-specific conclusion.  I have no
> particular reason to expect that to hold for future AIO use cases.  As above,
> I'd be inclined to omit the io_uring-level Valgrind calls until we have a
> concrete use case to drive their design.  How do you see it?

Yea, I'm not sure either.  I think we'll best wait until we have non-bufmgr
AIO to address all this. That'll make it easier to shake out.



I think eventually we should do an explicit
- VALGRIND_CHECK_MEM_IS_ADDRESSABLE() in pgaio_io_start_readv

- VALGRIND_CHECK_MEM_IS_DEFINED() in pgaio_io_start_writev

- VALGRIND_MAKE_MEM_DEFINED() in when a READV completes, although some of the
  details of when/where to do that aren't entirely clear to me yet. I suspect
  we might have to do it in pgaio_io_start_readv(), because it's harder to
  reliably do it later


I've pushed the patches. Thanks for the discussion, somehow the mix of shared
memory with valgrind tracking accessibility/definedness in a process local
manner is somewhat mindbending.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: pg_recvlogical cannot create slots with failover=true
Next
From: Melanie Plageman
Date:
Subject: Re: BAS_BULKREAD vs read stream