Re: Split xlog.c - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Split xlog.c |
Date | |
Msg-id | a462d79c-cb5a-47cc-e9ac-616b5003965f@iki.fi Whole thread Raw |
In response to | Re: Split xlog.c (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Split xlog.c
|
List | pgsql-hackers |
Here's a new version. It includes two new smaller commits, before the main refactoring: 1. Refactor setting XLP_FIRST_IS_OVERWRITE_CONTRECORD. I moved the code to set that flag from AdvanceXLInsertBuffer() into CreateOverwriteContrecordRecord(). That avoids the need for accessing the global variable in AdvanceXLInsertBuffer(), which is nice with this patch set because I moved the global variables into xlogrecord.c. For comparison, when we are writing a continuation record, the XLP_FIRST_IS_CONTRECORD flag is also set by the caller, CopyXLogRecordToWAL(), not AdvanceXLInsertBuffer() itself. So I think this is marginally more clear anyway. 2. Use correct WAL position in error message on invalid XLOG page header. This is the thing that Robert pointed out in the "xlog.c: removing ReadRecPtr and EndRecPtr" thread. I needed to make the change for the refactoring anyway, but since it's a minor bug fix, it seemed better to extract it to a separate commit, after all. Responses to Robert's comments below: On 20/10/2021 22:06, Robert Haas wrote: > - Some logic to (a) sanity-check the control file's REDO pointer, (b) > set InRecovery = true, and (c) update various bits of control file > state in memory has been moved substantially earlier. The actual > update of the control file on disk stays where it was before. At least > on first reading, I don't really like this. On the one hand, I don't > see a reason why it's necessary prerequisite for splitting xlog.c. On > the other hand, it seems a bit dangerous. The new contents of the control file are determined by the checkpoint record, presence of backup label file, and whether we're doing archive recovery. We have that information at hand in InitWalRecovery(), whereas the caller doesn't know or care whether a backup label file was present, for example. That's why I wanted to move that logic to InitWalRecovery(). However, I was afraid of moving the actual call to UpdateControlFile() there. That would be a bigger behavioral change. What if initializing one of the subsystems fails? Currently, the control file is left unchanged, but if we called UpdateControlFile() earlier, then it would be modified already. > There's now ~8 calls to functions in other modules between the time > you change things in memory and the time that you call > UpdateControlFile(). Perhaps none of those functions can call > anything that might in turn call UpdateControlFile() but I don't know > why we should take the chance. Is there some advantage to having the > in-memory state out of sync with the on-disk state across all that > code? The functions that get called in between don't call UpdateControlFile() and don't affect what gets written there. It would be pretty questionable if they did, even on master. But for the sake of the argument, let's see what would happen if they did: master: The later call to UpdateControlFile() writes out the same values again. Unless the changed field was one of the following: 'state', 'checkPoint', 'checkPointCopy', 'minRecoveryPoint', 'minRecoveryPointTLI', 'backupStartPoint', 'backupEndRequired' or 'time'. If it was one of those, then it may be overwritten with the values deduced from the starting checkpoint. After these patches: The later call to UpdateControlFile() writes out the same values again, even if it was one of those fields. Seems like a wash to me. It's hard to tell which behavior would be the correct one. On 'master', InRecovery might or might not already be set when we call those functions. It is already set if there was a backup label file, but if we're doing recover for any other reason, it's set only later. That's pretty sloppy. We check InRecovery in various assertions, and it affects whether UpdateMinRecoveryPoint() updates the control file or not, among other things. With these patches, InRecovery is always set at that point (or not, if recovery is not needed). That's a bit besides the point here, but it highlights that the current coding isn't very robust either if those startup functions tried to modify the control file. I think these patches make it a little better, or at least not worse. > - The code to clear InArchiveRecovery and close the WAL segment we had > open moves earlier. I think it might be possible to fail > Assert(InArchiveRecovery), because where you've moved this code, we > haven't yet verified that we reached the minimum recovery point. See > the comment which begins "It's possible that archive recovery was > requested, but we don't know how far we need to replay the WAL before > we reach consistency." What if we reach that point, then fail the big > hairy if-test and don't set InArchiveRecovery = true? In that case, we > can still do it later, in ReadRecord. But maybe that will never > happen. Actually it's not entirely clear to me that the assertion is > bulletproof even where it is right now, but moving it earlier makes me > even less confident. Possibly I just don't understand this well > enough. Hmm, yeah, this logic is hairy. I tried to find a case where that assertion would fail but couldn't find one. I believe it's correct, but we could probably make it more clear. In a nutshell, PerformWalRecovery() will never return, if (ArchiveRecoveryRequested && !InArchiveRecovery). Why? There are two ways that PerformWalRecovery() can return: 1. After reaching end of WAL. ReadRecord() will always always set InArchiveRecovery in that case, if ArchiveRecoveryRequested was set. It won't return NULL without doing that. 2. We reached the requested recovery target point. There's a check for that case in PerformWalRecovery(), it will throw an "ERROR: requested recovery stop point is before consistent recovery point" if that happens before InArchiveRecovery is set. Because reachedConsistency isn't set until crash recovery is finished. That said, independently of this patch series, perhaps that assertion should be changed into something like this: if (ArchiveRecoveryRequested) { - Assert(InArchiveRecovery); + /* + * If archive recovery was requested, we should not finish + * recovery before starting archive recovery. + * + * There are other checks for this in PerformWalRecovery() so + * this shouldn't happen, but let's be safe. + */ + if (!InArchiveRecovery) + elog(ERROR, "archive recovery was requested, but recovery finished before it started"); > It's a little tempting, too, to see if you could somehow consolidate > the two places that do if (readFile >= 0) { close(readFile); readFile > = -1 } down to one. Yeah, I thought about that, but couldn't find a nice way to do it. > - getRecoveryStopReason() is now called earlier than before, and is > now called whether or not ArchiveRecoveryRequested. This seems to > just move the point of initialization further from the point of use > to no real advantage, and I also think that the function is only > designed to do something useful for archive recovery, so calling it > in other cases just seems confusing. On the other hand, it's now closer to the actual end-of-recovery. The idea here is that it seems natural to return the reason that recovery ended along with all the other end-of-recovery information, in the same EndOfWalRecoveryInfo struct. Kyotaro commented on the same thing and suggested keeping the call getRecoveryStopReason() where it was. That'd require exposing getRecoveryStopReason() from xlogrecovery.c. Which isn't a big deal, we could do it, but in general I tried to minimize the surface area between xlog.c and xlogrecovery.c. If getRecoveryStopReason() was a separate function, should standby_signal_file_found and recovery_signal_file_found also be separate functions? I'd prefer to gather all the end-of-recovery information into one struct. > That's all I have on 0001. Is this kind of review helpful? Yes, very helpful, thank you! - Heikki
Attachment
pgsql-hackers by date: