Home > mailing lists

RE: pg_upgrade and logical replication - Mailing list pgsql-hackers

From	Hayato Kuroda (Fujitsu)
Subject	RE: pg_upgrade and logical replication
Date	February 14, 2024 03:37:03
Msg-id	TYCPR01MB120773E8D02D58F50B24DA6ADF54E2@TYCPR01MB12077.jpnprd01.prod.outlook.com Whole thread Raw
In response to	Re: pg_upgrade and logical replication (Justin Pryzby <pryzby@telsasoft.com>)
Responses	Re: pg_upgrade and logical replication Re: pg_upgrade and logical replication Re: pg_upgrade and logical replication
List	pgsql-hackers

Tree view

Dear Justin,

> pg_upgrade/t/004_subscription.pl says
>
> |my $mode = $ENV{PG_TEST_PG_UPGRADE_MODE} || '--copy';
>
> ..but I think maybe it should not.
>
> When you try to use --link, it fails:
> https://cirrus-ci.com/task/4669494061170688
>
> |Adding ".old" suffix to old global/pg_control                 ok
> |
> |If you want to start the old cluster, you will need to remove
> |the ".old" suffix from
> /tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_su
> bscription_old_sub_data/pgdata/global/pg_control.old.
> |Because "link" mode was used, the old cluster cannot be safely
> |started once the new cluster has been started.
> |...
> |
> |postgres: could not find the database system
> |Expected to find it in the directory
> "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s
> ubscription_old_sub_data/pgdata",
> |but could not open file
> "/tmp/cirrus-ci-build/build/testrun/pg_upgrade/004_subscription/data/t_004_s
> ubscription_old_sub_data/pgdata/global/pg_control": No such file or directory
> |# No postmaster PID for node "old_sub"
> |[19:36:01.396](0.250s) Bail out!  pg_ctl start failed
>

Good catch! The primal reason of the failure is to reuse the old cluster, even after
the successful upgrade. The documentation said [1]:

>
If you use link mode, the upgrade will be much faster (no file copying) and use less
disk space, but you will not be able to access your old cluster once you start the new
cluster after the upgrade.
>

> You could rename pg_control.old to avoid that immediate error, but that doesn't
> address the essential issue that "the old cluster cannot be safely started once
> the new cluster has been started."

Yeah, I agreed that it should be avoided to access to the old cluster after the upgrade.
IIUC, pg_upgrade would be run third times in 004_subscription.

1. successful upgrade
2. failure due to the insufficient max_replication_slot
3. failure because the pg_subscription_rel has 'd' state

And old instance is reused in all of runs. Therefore, the most reasonable fix is to
change the ordering of tests, i.e., "successful upgrade" should be done at last.

Attached patch modified the test accordingly. Also, it contains some optimizations.
This can pass the test on my env:

```
pg_upgrade]$ PG_TEST_PG_UPGRADE_MODE='--link' PG_TEST_TIMEOUT_DEFAULT=10 make check PROVE_TESTS='t/004_subscription.pl'
...
# +++ tap check in src/bin/pg_upgrade +++
t/004_subscription.pl .. ok
All tests successful.
Files=1, Tests=14,  9 wallclock secs ( 0.03 usr  0.00 sys +  0.55 cusr  1.08 csys =  1.66 CPU)
Result: PASS
```

How do you think?

[1]: https://www.postgresql.org/docs/devel/pgupgrade.html

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/

Attachment

0001-Fix-testcase.patch

pgsql-hackers by date:

From: Michael Paquier
Date: 14 February 2024, 03:28:38
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations

From: "Zhijie Hou (Fujitsu)"
Date: 14 February 2024, 04:03:58
Subject: RE: Synchronizing slots from primary to standby

RE: pg_upgrade and logical replication - Mailing list pgsql-hackers

Attachment

Previous

Next