Re: AIO v2.5 - Mailing list pgsql-hackers

From Noah Misch
Subject Re: AIO v2.5
Date
Msg-id 20250330024513.ac.nmisch@google.com
Whole thread Raw
In response to Re: AIO v2.5  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Sat, Mar 29, 2025 at 08:39:54PM -0400, Andres Freund wrote:
> On 2025-03-29 14:29:29 -0700, Noah Misch wrote:
> > On Wed, Mar 26, 2025 at 09:07:40PM -0400, Andres Freund wrote:

> > The choice between LOG and LOG_SERVER_ONLY doesn't matter much for $SUBJECT.
> > If a client has decided to set client_min_messages that high, the client might
> > be interested in the fact that it got side-tracked completing someone else's
> > IO.  On the other hand, almost none of those sidetrack events will produce
> > messages.  The main argument I'd envision for LOG_SERVER_ONLY is that we
> > consider the message content sensitive, but I don't see the message content as
> > materially sensitive.
> 
> I don't think it's sensitive - it just seems a bit sillier to send the same
> thing to the client twice than to the server log.

Ah, that adds weight to the benefit of LOG_SERVER_ONLY.

> I'm happy to change it to
> LOG if you prefer. Your points below mean some comments need to be updated in
> smgr/md.c anyway.

Nah.

> > > +/*
> > > + * smgrstartreadv() -- asynchronous version of smgrreadv()
> > > + *
> > > + * This starts an asynchronous readv IO using the IO handle `ioh`. Other than
> > > + * `ioh` all parameters are the same as smgrreadv().
> >
> > I would add a comment starting with:
> >
> >   Compared to smgrreadv(), more responsibilities fall on layers above smgr.
> >   Higher layers handle partial reads.  smgr will ereport(LOG_SERVER_ONLY) some
> >   problems, but higher layers are responsible for pgaio_result_report() to
> >   mirror that news to the user and (for ERROR) abort the (sub)transaction.
> 
> Hm - if we document that in all the smgrstart* we'd end up with something like
> that in a lot of places - but OTOH, this is the first one so far...

Alternatively, to avoid duplication:

  See $PLACE for the tasks that the caller's layer owns, in contrast to smgr
  owning them for smgrreadv().

> > I say "comment starting with", because I think there's a remaining decision
> > about who owns the zeroing currently tied to smgrreadv().  An audit of
> > mdreadv() vs. AIO counterparts found this part of mdreadv():
> >
> >             if (nbytes == 0)
> >             {
> >                 /*
> >                  * We are at or past EOF, or we read a partial block at EOF.
> >                  * Normally this is an error; upper levels should never try to
> >                  * read a nonexistent block.  However, if zero_damaged_pages
> >                  * is ON or we are InRecovery, we should instead return zeroes
> >                  * without complaining.  This allows, for example, the case of
> >                  * trying to update a block that was later truncated away.
> >                  */
> >                 if (zero_damaged_pages || InRecovery)
> >                 {
> >
> > I didn't write a test to prove its absence, but I'm not finding such code in
> > the AIO path.
> 
> Yes, there is no such codepath
> 
> A while ago I had started a thread about whether the above codepath is
> necessary

postgr.es/m/3qxxsnciyffyf3wyguiz4besdp5t5uxvv3utg75cbcszojlz7p@uibfzmnukkbd
which I had forgotten completely.

I've redone your audit, and I agree the InRecovery case is dead code.
check-world InRecovery reaches mdstartreadv() and mdreadv() only via
XLogReadBufferExtended(), vm_readbuf(), and fsm_readbuf().

The zero_damaged_pages case entails more of a judgment call about whether or
not its rule breakage eclipses its utility.  Fortunately, a disappointed
zero_damaged_pages user could work around that code's absence by stopping the
server and using "dd" to extend the segment with zeros.

> I had planned to put in an error into mdreadv() at the time, but somehow lost
> track of that - I kind of mentally put this issue into the "done" category :(

> At the very least we need to add a comment about this though.

I'm fine with either of:

1. Replace that mdreadv() code with an error.

2. Update comment in mdreadv() that we're phasing out this code due the
   InRecovery case's dead code status and the zero_damaged_pages problems; AIO
   intentionally doesn't replicate it.  Maybe add Assert(false).

I'd do (2) for v18, then do (1) first thing in v19 development.

> > > +    *zeroed_or_error_count = rem_error & ((1 << 7) - 1);
> > > +    rem_error >>= 7;
> >
> > These raw "7" are good places to use your new #define values.  Likewise in
> > buffer_readv_encode_error().
> 
> Which define value are you thinking of here? I don't think any of the ones I
> added apply?  But I think you're right it'd be good to have some define for
> it, at least locally.

It was just my imagination.  Withdrawn.

> It's now:
> 
> /*
>  * We need a backend-local completion callback for shared buffers, to be able
>  * to report checksum errors correctly. Unfortunately that can only safely
>  * happen if the reporting backend has previously called
>  * pgstat_prepare_report_checksum_failure(), which we can only guarantee in
>  * the backend that started the IO. Hence this callback.
>  */

Sounds good.

> Updated to:
>     /*
>      * Throw a WARNING/LOG, as instructed by PIV_LOG_*, if the checksum fails,
>      * but only after we've checked for the all-zeroes case.
>      */
> 
> I found one more, the newly added comment about checksum_failure_p was still
> talking about ignore_checksum_failure, but it should now be IGNORE_CHECKSUM_FAILURE.

That works.

> > > Subject: [PATCH v2.14 18/29] aio: Add test_aio module
> >
> > I didn't yet re-review the v2.13 or 2.14 changes to this one.  That's still in
> > my queue.
> 
> That's good - I think some of the tests need to expand a bit more. Since
> that's at the end of the dependency chain...

Okay, I'll delay on re-reviewing that one.  When it's a good time, please put
the CF entry back in Needs Review.  The patches before it are all ready for
commit after the above points of this mail.



pgsql-hackers by date:

Previous
From: Sutou Kouhei
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Next
From: Tom Lane
Date:
Subject: Re: Amcheck verification of GiST and GIN