Re: AIO v2.5 - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | 20250401160727.bd.nmisch@google.com Whole thread Raw |
In response to | Re: AIO v2.5 (Andres Freund <andres@anarazel.de>) |
Responses |
Re: AIO v2.5
|
List | pgsql-hackers |
On Tue, Apr 01, 2025 at 11:55:20AM -0400, Andres Freund wrote: > On 2025-04-01 08:11:59 -0700, Noah Misch wrote: > > On Mon, Mar 31, 2025 at 08:41:39PM -0400, Andres Freund wrote: > I haven't yet pushed the changes, but will work on that in the afternoon. > > I plan to afterwards close the CF entry and will eventually create a new one > for write support, although probably only rebasing onto > https://postgr.es/m/stj36ea6yyhoxtqkhpieia2z4krnam7qyetc57rfezgk4zgapf%40gcnactj4z56m > and addressing some of the locking issues. Sounds good. > WRT the locking issues, I've been wondering whether we could make > LWLockWaitForVar() work that purpose, but I doubt it's the right approach. > Probably better to get rid of the LWLock*Var functions and go for the approach > I had in v1, namely a version of LWLockAcquire() with a callback that gets > called between LWLockQueueSelf() and PGSemaphoreLock(), which can cause the > lock acquisition to abort. What are the best thing(s) to read to understand the locking issues? > > > +# Verify checksum handling when creating database from an invalid database. > > > +# This also serves as a minimal check that cross-database IO is handled > > > +# reasonably. > > > > To me, "invalid database" is a term of art from the message "cannot connect to > > invalid database". Hence, I would change "invalid database" to "database w/ > > invalid block" or similar, here and below. (Alternatively, just delete "from > > an invalid database". It's clear from the context.) > > Yea, I agree, this is easy to misunderstand when stepping back. I went for "with > an invalid block". Sounds good. > > > + if (corrupt_checksum) > > > + { > > > + bool successfully_corrupted = 0; > > > + > > > + /* > > > + * Any single modification of the checksum could just end up being > > > + * valid again. To be sure > > > + */ > > > > Unfinished sentence. > > Oops. See below. > > > > That said, I'm not following why we'd need this loop. If this test code > > were changing the input to the checksum, it's true that an input bit flip > > might reach the same pd_checksum. The test case is changing pd_checksum, > > not the input bits. > > We might be changing the input, due to the zero/corrupt_header options. Or we > might be called on a page that is *already* corrupted. I did encounter that > situation once while writing tests, where the tests only passed if I made the > + 1 a + 2. Which was, uh, rather confusing and left me feel like I was cursed > that day. Got it. > > I don't see how changing pd_checksum could leave the > > page still valid. There's only one valid pd_checksum value for a given > > input page. > > I updated the comment to: > /* > * Any single modification of the checksum could just end up being > * valid again, due to e.g. corrupt_header changing the data in a way > * that'd result in the "corrupted" checksum, or the checksum already > * being invalid. Retry in that, unlikely, case. > */ Works for me.
pgsql-hackers by date: