Re: AIO v2.5 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | brd4ebxlbnnehjb2jifkme4hovdmxszxqnnfbuoblrarkgvepu@ydfqaoe3zime Whole thread Raw |
In response to | Re: AIO v2.5 (Noah Misch <noah@leadboat.com>) |
Responses |
Re: AIO v2.5
|
List | pgsql-hackers |
Hi, On 2025-03-24 19:20:37 -0700, Noah Misch wrote: > On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote: > > static void > > TerminateBufferIO(BufferDesc *buf, bool clear_dirty, uint32 set_flag_bits, > > - bool forget_owner) > > + bool forget_owner, bool syncio) > > ... > Looking at the callers: > > ZeroAndLockBuffer[1083] TerminateBufferIO(bufHdr, false, BM_VALID, true, true); > ExtendBufferedRelShared[2869] TerminateBufferIO(buf_hdr, false, BM_VALID, true, true); > FlushBuffer[4827] TerminateBufferIO(buf, true, 0, true, true); > AbortBufferIO[6637] TerminateBufferIO(buf_hdr, false, BM_IO_ERROR, false, true); > buffer_readv_complete_one[7279] TerminateBufferIO(buf_hdr, false, set_flag_bits, false, false); > buffer_writev_complete_one[7427] TerminateBufferIO(buf_hdr, clear_dirty, set_flag_bits, false, false); > > I think we can improve on the "syncio" arg name. The first two aren't doing > IO, and AbortBufferIO() may be cleaning up what would have been an AIO if it > hadn't failed early. Perhaps name the arg "release_aio" and pass > release_aio=true instead of syncio=false (release_aio = !syncio). Yes, I think that makes sense. Will do that tomorrow. > > +static pg_attribute_always_inline PgAioResult > > +buffer_readv_complete_one(uint8 buf_off, Buffer buffer, uint8 flags, > > + bool failed, bool is_temp) > > +{ > ... > > + if ((flags & READ_BUFFERS_ZERO_ON_ERROR) || zero_damaged_pages) > > + { > > + ereport(WARNING, > > + (errcode(ERRCODE_DATA_CORRUPTED), > > + errmsg("invalid page in block %u of relation %s; zeroing out page", > > My earlier review requested s/LOG/WARNING/, but I wasn't thinking about this > in full depth. In the !is_temp case, this runs in a complete_shared > callback. A process unrelated to the original IO may run this callback. > That's unfortunate in two ways. First, that other process's client gets an > unexpected WARNING. The process getting the WARNING may not even have > zero_damaged_pages enabled. Second, the client of the process that staged > the IO gets no message. Ah, right. That could be why I had flipped it. If so, shame on me for not adding a comment... > AIO ERROR-level messages handle this optimally. We emit a LOG-level message > in the process that runs the complete_shared callback, and we arrange for the > ERROR-level message in the stager. That would be ideal here: LOG in the > complete_shared runner, WARNING in the stager. We could obviously downgrade (crossgrade? A LOG is more severe than a LOG in some ways, but not others) the message when run in a different backend fairly easily. Still emitting a WARNING in the stager however is a bit more tricky. Before thinking more deeply about how we could emit WARNING in the stage: Is it actually sane to use WARNING here? At least for ZERO_ON_ERROR that could trigger a rather massive flood of messages to the client in a *normal* situation. I'm thinking of something like an insert extending a relation some time after an immediate restart and encountering a lot of FSM corruption (due to its non-crash-safe-ness) during the search for free space and the subsequent FSM vacuum. It might be ok to LOG that, but sending a lot of WARNINGs to the client seems not quite right. If we want to implement it, I think we could introduce PGAIO_RS_WARN, which then could tell the stager to issue the WARNING. It would add a bit of distributed cost, both to callbacks and users of AIO, but it might not be too bad. > One could simplify things by forcing io_method=sync under ZERO_ON_ERROR || > zero_damaged_pages, perhaps as a short-term approach. Yea, that could work. Perhaps even just for zero_damaged_pages, after changing it so that ZERO_ON_ERROR always just LOGs. Hm, it seems somewhat nasty to have rather different performance characteristics when forced to use zero_damaged_pages to recover from a problem. Imagine an instance that's configured to use DIO and then needs to use zero_damaged_pages to recove from corruption... /me adds writing a test for both ZERO_ON_ERROR and zero_damaged_pages to the TODO. Greetings, Andres Freund
pgsql-hackers by date: