On Mon, Aug 07, 2023 at 12:42:33PM +0530, Amit Kapila wrote:
> On Mon, Aug 7, 2023 at 11:29 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > Unless I'm missing something I don't see what prevents something to connect
> > using the replication protocol and issue any query or even create new
> > replication slots?
> >
>
> I think the point is that if we have any slots where we have not
> consumed the pending WAL (other than the expected like
> SHUTDOWN_CHECKPOINT) or if there are invalid slots then the upgrade
> won't proceed and we will request user to remove such slots or ensure
> that WAL is consumed by slots. So, I think in the case you mentioned,
> the upgrade won't succeed.
What if new slots are added while the old instance is started in the middle of
pg_upgrade, *after* the various checks are done?
> > Note also that as complained a few years ago nothing prevents a bgworker from
> > spawning up during pg_upgrade and possibly corrupt the upgraded cluster if
> > multixid are assigned. If publications are preserved wouldn't it mean that
> > such bgworkers could also lead to data loss?
> >
>
> Is it because such workers would write some WAL which slots may not
> process? If so, I think it is equally dangerous as other problems that
> can arise due to such a worker. Do you think of any special handling
> here?
Yes, and there were already multiple reports of multixact corruption due to
bgworker activity during pg_upgrade (see
https://www.postgresql.org/message-id/20210121152357.s6eflhqyh4g5e6dv@dalibo.com
for instance). I think we should once and for all fix this whole class of
problem one way or another.