Tom Lane wrote:
> While nosing around the problem areas, I think I've found yet another
> issue here. The global bool InRecovery is only maintained correctly
> in the startup process, which wasn't a problem before 8.4. However,
> if we are making the bgwriter execute the end-of-recovery checkpoint,
> there are multiple places where it is tested that are going to be
> executed by bgwriter. I think (but am not 100% sure) that these
> are all the at-risk references:
> XLogFlush
> CheckPointMultiXact
> CreateCheckPoint (2 places)
> Heikki's latest patch deals with the tests in CreateCheckPoint (rather
> klugily IMO) but not the others. I think it might be better to fix
> things so that InRecovery is maintained correctly in the bgwriter too.
We could set InRecovery=true in CreateCheckPoint if it's a startup
checkpoint, and reset it afterwards. I'm not 100% sure it's safe to have
bgwriter running with InRecovery=true at other times. Grepping for
InRecovery doesn't show anything that bgwriter calls, but it feels safer
that way.
Hmm, I see another small issue. We now keep track of the "minimum
recovery point". Whenever a data page is flushed, we set minimum
recovery point to the LSN of the page in XLogFlush(), instead of
fsyncing WAL like we do in normal operation. During the end-of-recovery
checkpoint, however, RecoveryInProgress() returns false, so we don't
update minimum recovery point in XLogFlush(). You're unlikely to be
bitten by that in practice; you would need to crash during the
end-of-recovery checkpoint, and then set the recovery target to an
earlier point. It should be fixed nevertheless.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com