Hi,
On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote:
> We found an issue in pg_upgrade on a cluster with a third-party
> background worker. The upgrade goes fine, but the new cluster is then in
> an inconsistent state. The background worker comes from the PoWA
> extension but the issue does not appear to related to this particular
> code.
Well, it does imply that that backgrounder did something, as the pure
existence of a bgworker shouldn't affect
anything. Presumably the issue is that the bgworker actually does
transactional writes, which causes problems because the xids /
multixacts from early during pg_upgrade won't actually be valid after we
do pg_resetxlog etc.
> As a solution, it seems that, for similar reasons that we restrict
> socket access to prevent accidental connections (from commit
> f763b77193), we should also prevent background workers to start at this
> step.
I think that'd have quite the potential for negative impact - imagine
extensions that refuse to be loaded outside of shared_preload_libraries
(e.g. because they need to allocate shared memory) but that are required
during the course of pg_upgrade (e.g. because it's a tableam, a PL or
such). Those libraries will then tried to be loaded during the upgrade
(due to the _PG_init() hook being called when functions from the
extension are needed, e.g. the tableam or PL handler).
Nor is it clear to me that the only way this would be problematic is
with shared_preload_libraries. A library in local_preload_libraries, or
a demand loaded library can trigger bgworkers (or database writes in
some other form) as well.
I wonder if we could
a) set default_transaction_read_only to true, and explicitly change it
in the sessions that need that.
b) when in binary upgrade mode / -b, error out on all wal writes in
sessions that don't explicitly set a session-level GUC to allow
writes.
Greetings,
Andres Freund