Re: Exceptional md.c paths for recovery and zero_damaged_pages - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Exceptional md.c paths for recovery and zero_damaged_pages |
Date | |
Msg-id | kzmowqwplg5jwl5gu5b6u6fj7glo2xyqfsue4h4xtww2xfgn3u@xufsmkqi7fjm Whole thread Raw |
In response to | Re: Exceptional md.c paths for recovery and zero_damaged_pages (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
Hi, On 2024-12-13 19:06:16 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > We have a fair number of special paths in md.c that are specific to > > recovery. E.g. in mdreadv() we do: > > ... > > As far as I can tell, nearly all - including the above - InRecovery paths in > > md.c are basically unreachable. And have been for quite a while. > > > XLogReadBufferExtended() takes care to > > a) Create the fork if it doesn't yet exist. > > b) Not to read from beyond EOF. If EOF is found, we extend the relation to be > > large enough. > > Which afaict should suffice to prevent needing the above? > > I haven't checked the git history, but I suspect this logic is later > than the md.c code you mention, and may well render it obsolete. Oddly enough, not really - the logic originated in the original REDO/UNDO commit, in 2000 (b58c0411bad4). > > The InRecovery paths for _mdfd_getseg seem to originate in 2004's 303e46ea932 > > and the zero-beyond-eof seems to be from 2007's ef07221997e - although it was > > just *restricted* to InRecovery in that commit. > > We definitely needed 303e46ea932 at the time, but that doesn't mean > we still do. I'd assume it was needed at the time, I just can't quite figure out by what. We used to have more code using the fake relcache stuff, which indicates that those places weren't going through extend-the-file logic in XLogReadBufferExtended(). E.g. the "incomplete split" logic that Heikki removed from a bunch of indexes. Another theory is that it could have been related to wal_level=minimal. Before Noah's redesign in c6b92041d38 there were rather weird mixes of logged and unlogged modifications to the same relfilenode. I don't quite see how it could have required this behaviour in mdreadv() though. > Possibly ef07221997e was just copying the earlier logic. It did note this: Also, remove the kluge of allowing mdread() to return zeroes from a read-beyond-EOF: this is now an error condition except when InRecovery or zero_damaged_pages = true. (Hash indexes used to require that behavior, but no more.) which is about this hunk: /* - * If we are at or past EOF, return zeroes without complaining. Also - * substitute zeroes if we found a partial block at EOF. - * - * XXX this is really ugly, bad design. However the current - * implementation of hash indexes requires it, because hash index - * pages are initialized out-of-order. + * Short read: we are at or past EOF, or we read a partial block at + * EOF. Normally this is an error; upper levels should never try to + * read a nonexistent block. However, if zero_damaged_pages is ON + * or we are InRecovery, we should instead return zeroes without + * complaining. This allows, for example, the case of trying to + * update a block that was later truncated away. */ Before this there was no mention of recovery, so perhaps the InRecovery bit was just trying to not break recovery (there weren't any tests for that at the time, I think, so that'd be quite resonable). I'm wondering if we should just put an error into the relevant paths in HEAD and see whether it triggers for anybody in the next months. Having all these untested paths in md.c forever doesn't seem great. Greetings, Andres Freund
pgsql-hackers by date: