Re: AIO v2.5 - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | 20250325141120.8e.nmisch@google.com Whole thread Raw |
In response to | Re: AIO v2.5 (Andres Freund <andres@anarazel.de>) |
Responses |
Re: AIO v2.5
|
List | pgsql-hackers |
On Mon, Mar 24, 2025 at 10:52:19PM -0400, Andres Freund wrote: > On 2025-03-24 19:20:37 -0700, Noah Misch wrote: > > On Thu, Mar 20, 2025 at 09:58:37PM -0400, Andres Freund wrote: > > > +static pg_attribute_always_inline PgAioResult > > > +buffer_readv_complete_one(uint8 buf_off, Buffer buffer, uint8 flags, > > > + bool failed, bool is_temp) > > > +{ > > ... > > > + if ((flags & READ_BUFFERS_ZERO_ON_ERROR) || zero_damaged_pages) > > > + { > > > + ereport(WARNING, > > > + (errcode(ERRCODE_DATA_CORRUPTED), > > > + errmsg("invalid page in block %u of relation %s; zeroing out page", > > > > My earlier review requested s/LOG/WARNING/, but I wasn't thinking about this > > in full depth. In the !is_temp case, this runs in a complete_shared > > callback. A process unrelated to the original IO may run this callback. > > That's unfortunate in two ways. First, that other process's client gets an > > unexpected WARNING. The process getting the WARNING may not even have > > zero_damaged_pages enabled. Second, the client of the process that staged > > the IO gets no message. > > Ah, right. That could be why I had flipped it. If so, shame on me for not > adding a comment... > > > > AIO ERROR-level messages handle this optimally. We emit a LOG-level message > > in the process that runs the complete_shared callback, and we arrange for the > > ERROR-level message in the stager. That would be ideal here: LOG in the > > complete_shared runner, WARNING in the stager. > > We could obviously downgrade (crossgrade? A LOG is more severe than a LOG in > some ways, but not others) the message when run in a different backend fairly > easily. Still emitting a WARNING in the stager however is a bit more tricky. > > Before thinking more deeply about how we could emit WARNING in the stage: > > Is it actually sane to use WARNING here? At least for ZERO_ON_ERROR that could > trigger a rather massive flood of messages to the client in a *normal* > situation. I'm thinking of something like an insert extending a relation some > time after an immediate restart and encountering a lot of FSM corruption (due > to its non-crash-safe-ness) during the search for free space and the > subsequent FSM vacuum. It might be ok to LOG that, but sending a lot of > WARNINGs to the client seems not quite right. Orthogonal to AIO, I do think LOG (or even DEBUG1?) is better for ZERO_ON_ERROR. The ZERO_ON_ERROR case also should not use ERRCODE_DATA_CORRUPTED. (That errcode shouldn't appear for business as usual. It should signify wrong or irretrievable query results, essentially.) For zero_damaged_pages, WARNING seems at least defensible, and ERRCODE_DATA_CORRUPTED is right. It wouldn't be the worst thing to change zero_damaged_pages to LOG and let the complete_shared runner log it, as long as we release-note that. It's superuser-only, and the superuser can learn to check the log. One typically should use zero_damaged_pages in one session at a time, so the logs won't be too confusing. Another thought on complete_shared running on other backends: I wonder if we should push an ErrorContextCallback that adds "CONTEXT: completing I/O of other process" or similar, so people wonder less about how "SELECT FROM a" led to a log message about IO on table "b". > If we want to implement it, I think we could introduce PGAIO_RS_WARN, which > then could tell the stager to issue the WARNING. It would add a bit of > distributed cost, both to callbacks and users of AIO, but it might not be too > bad. > > > > One could simplify things by forcing io_method=sync under ZERO_ON_ERROR || > > zero_damaged_pages, perhaps as a short-term approach. > > Yea, that could work. Perhaps even just for zero_damaged_pages, after > changing it so that ZERO_ON_ERROR always just LOGs. Yes. > Hm, it seems somewhat nasty to have rather different performance > characteristics when forced to use zero_damaged_pages to recover from a > problem. Imagine an instance that's configured to use DIO and then needs to > use zero_damaged_pages to recove from corruption... True. I'd be willing to bet high-scale use of zero_damaged_pages is rare. By high scale, I mean something like reading a whole large table, as opposed to a TID scan of the known-problematic range. That said, people (including me) expect the emergency tools to be good even if they're used rarely. You're not wrong to worry about it.
pgsql-hackers by date: