On Fri, Feb 28, 2025 at 02:51:27PM -0600, Nathan Bossart wrote:
> Cool. I appreciate the design feedback.
One other design point I wanted to bring up is whether we should bother
generating a rollback script for the new "swap" mode. In short, I'm
wondering if it would be unreasonable to say that, just for this mode, once
pg_upgrade enters the file transfer step, reverting to the old cluster
requires restoring a backup. I believe that's worth considering for the
following reasons:
* Anecdotally, I'm not sure I've ever actually seen pg_upgrade fail during
or after file transfer, and I'm hoping to get some real data about that
in the near future. Has anyone else dealt with such a failure? I
suspect that failures during file transfer are typically due to OS
crashes, power losses, etc., and hopefully those are rare.
* I've spent quite some time trying to generate a portable script, but it's
quite complicated and difficult to reason about its correctness. And I
haven't even started on the Windows version. Leaving this part out would
simplify the patch set quite a bit.
* If we give up the idea of reverting to the old cluster, we also can avoid
a bunch of intermediate fsync() calls which I only included to help
reason about the state of the files in case you failed halfway through.
This might not add up to much, but it's at least another area of
simplification.
Of course, rollback would still be possible, but you'd really need to
understand what "swap" mode does behind the scenes to do so safely. In any
case, I'm growing skeptical that a probably-not-super-well-tested script
that extremely few people will need and fewer will use is worth the
complexity.
--
nathan