On Mon, Feb 1, 2016 at 6:58 PM, Andres Freund <andres@anarazel.de> wrote:
> currently if, when not in standby mode, we can't read a checkpoint
> record, we automatically fall back to the previous checkpoint, and start
> replay from there.
>
> Doing so without user intervention doesn't actually seem like a good
> idea. While not super likely, it's entirely possible that doing so can
> wreck a cluster, that'd otherwise easily recoverable. Imagine e.g. a
> tablespace being dropped - going back to the previous checkpoint very
> well could lead to replay not finishing, as the directory to create
> files in doesn't even exist.
>
> As there's, afaics, really no "legitimate" reasons for needing to go
> back to the previous checkpoint I don't think we should do so in an
> automated fashion.
>
> All the cases where I could find logs containing "using previous
> checkpoint record at" were when something else had already gone pretty
> badly wrong. Now that obviously doesn't have a very large significance,
> because in the situations where it "just worked" are unlikely to be
> reported...
>
> Am I missing a reason for doing this by default?
I agree: this seems like a terrible idea. Would we still have some
way of forcing the older checkpoint record to be used if somebody
wants to try to do that?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company