Robert Haas <robertmhaas@gmail.com> writes:
> IMHO it's 100% clear how to make it robust. If you want to check that
> two values are the same, you can't let one of them be overwritten by
> an unrelated event in the middle of the check. There are many specific
> things we could do here, a few of which I proposed in my previous
> email, but they all boil down to "don't let autovacuum screw up the
> results".
It doesn't really matter how robust a test case is, if it isn't testing
the thing you need to have tested. So I remain unwilling to disable
autovac in a way that won't match real-world usage.
Note that the patch you proposed at [1] will not fix anything.
It turns off autovac in the new node, but the buildfarm failures
we've seen appear to be due to autovac running on the old node.
(I believe that autovac in the new node is *also* a hazard, but
it seems to be a lot less of one, presumably because of timing
considerations.) To make it work, we'd have to shut off autovac
in the old node before starting pg_upgrade, and that would make it
unacceptably (IMHO) different from what real users will do.
Conceivably, we could move all of this processing into pg_upgrade
itself --- autovac disable/re-enable and capturing of the horizon
data --- and that would address my complaint. I don't really want
to go there though, especially when in the final analysis IT IS
NOT A BUG if a rel's horizons advance a bit during pg_upgrade.
It's only a bug if they become inconsistent with the rel's data,
which is not what this test is testing for.
regards, tom lane
[1] https://www.postgresql.org/message-id/CA%2BTgmoZkBcMi%2BNikxfc54dgkWj41Q%3DZ4nuyHpheTcxA-qfS5Qg%40mail.gmail.com