Re: a very minor bug and a couple of comment changes for basebackup.c - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: a very minor bug and a couple of comment changes for basebackup.c |
Date | |
Msg-id | ZAZ+pjj5yD2XqbFE@tamriel.snowman.net Whole thread Raw |
In response to | Re: a very minor bug and a couple of comment changes for basebackup.c (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > Thanks for the review. I have committed the patches. No objections to what was committed. > On Thu, Mar 2, 2023 at 2:59 AM Michael Paquier <michael@paquier.xyz> wrote: > > There is more to it: the page LSN is checked before its checksum. > > Hence, if the page's LSN is corrupted in a way that it is higher than > > sink->bbs_state->startptr, the checksum verification is just skipped > > while the page is broken but not reported as such. (Spoiler: this has > > been mentioned in the past, and maybe we'd better remove this stuff in > > its entirety.) > > Yep. It's pretty embarrassing that we have a checksum verification > feature that has both known ways of producing false positives and > known ways of producing false negatives and we have no plan to ever > fix that, we're just going to keep shipping what we've got. I think > it's pretty clear that the feature shouldn't have been committed like > this; valid criticisms of the design were offered and simply not > addressed, not even by updating the comments or documentation with > disclaimers. I find the code in sendFile() pretty ugly, too. For all > of that, I'm a bit uncertain whether ripping it out is the right thing > to do. It might be (I'm not sure) that it tends to work decently well > in practice. Like, yes, it could produce false checksum warnings, but > does that actually happen to people? It's probably not too likely that > a read() or write() of 8kB gets updated after doing only part of the > I/O, so the concurrent read or write is fairly likely to be on-CPU, I > would guess, and therefore this algorithm might kind of work OK in > practice despite begin garbage on a theoretical level. Similarly, the > problems with how the LSN is vetted make it likely that a page > replaced with random garbage will go undetected, but a page where a > few bytes get flipped in a random position within the page is likely > to get caught, and maybe the latter is actually a bigger risk than the > former. I don't really know. I'd be interested to hear from anyone > with a lot of practical experience using the feature. A few anecdotes > of the form "this routine fails to tell us about problems" or "this > often complains about problems that are not real" or "this has really > helped us out on a number of occasions and we have had no problems > with it" would be really helpful. The concern about the LSN is certainly a valid one and we ended up ripping that check out of pgbackrest in favor of taking a different approach- simply re-read the page and see if it changed. If it changed, then we punt and figure that it was a hot page that PG was actively writing to and so it'll be in the WAL and we don't have to worry about it. We're generally concerned more about latent on-disk corruption that is missed than about some kind of in-memory corruption and the page changing in the filesystem cache without some other process writing to it just seems to be awfully unlikely. https://github.com/pgbackrest/pgbackrest/commit/9eec98c61302121134d2067326dbd2cd0f2f0b9c From a practical perspective, while this has the afore-mentioned risk regarding our loop happening twice fast enough that it re-reads the same page without the i/o on the page progressing at all, I'm not aware of even one report of that actually happening. We absolutely have seen cases where the first read picks up a torn page and that's not even uncommon in busy environments. One of the ideas I've had around how to address all of this is to not depend on inferring things from what's been written out but instead to just ask PG and that's one of the (relatively few...) benefits that I see to an archive_library- the ability to check if a given page is in shared buffers and, if so, what its LSN is to see if it's past the start of the backup. Given that pg_basebackup is already working with a lot of server-side code.. perhaps this could be a direction to go in for it. Another idea we played around with was keeping track of the LSNs of the pages with invalid checksums and checking that they fall within the range of start LSN-end LSN of the backup; while that wouldn't completely prevent corrupted LSNs from skipping detection using the LSN approach, it'd sure make it a whole lot less likely. Lastly, I've also wondered about trying to pull (clean) pages from shared_buffers directly instead of reading them off of disk at all, perhaps using a ring buffer in shared_buffers to read them in if they're not already there, and letting all the usual checksum validation and such happening with PG handling it. Dirty pages would have to be handled though, presumably by locking them and then reading the page from disk as I don't think folks would be pleased if we decided to forcibly write them out. For an incremental backup, if you have reliable knowledge that the page is in the WAL from PG, perhaps you could even skip the page entirely. Independently of verifying PG checksums, we've also considered an approach where we take the file-level checksums we have in the pgbackrest manifest and check if the file has changed on the filesystem without the timestamp changing as that would certainly be indicative of something gone wrong. Certainly interested in other ideas about how to improve things in this area. I don't think we should rip it out though. Thanks, Stephen
Attachment
pgsql-hackers by date: