Thread: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?

Hi!

(posting this to -hackers rather than to -docs since it seems a deeper problem than just adjusting the docs)

I recently observed a case with standby corruption after upgrading pg12 to pg14, which was presented in the form of XX001 errors on the new cluster's standby nodes. e.g.:
  ERROR: missing chunk number 0 for toast value 3228893903 in pg_toast_79504413

Comparing the content of the data directory and checking files with md5sum, I noticed that some files for some TOAST index have different content on new standby nodes compared to the new primary – and interesting was the fact all standbys had the same content. Just different compared to the primary.

We used the "rsync --size-only" snippet from the docs https://www.postgresql.org/docs/current/pgupgrade.html to upgrade standbys.

With "--size-only", 1 GiB files for tables and indexes obviously cannot be reliably synchronized. In our case, we perform additional steps involving logical replication, advancing primary to certain LSN position -- and during that, we keep standbys down. This explains the increased corruption risks. But I think these risks are present for those who just follow the steps in the docs as is, and probably some fixes or improvements are needed here.

The main question: why do we consider "rsync --size-only" as reliable in the general case? May standby corruption happen if we use follow steps from https://www.postgresql.org/docs/current/pgupgrade.html?

Considering several general situations:
1. For streaming replication:
    a. if we shut down the primary first, based on the code in walsender.c defining how shutdown even is handled, replicas should receive all the changes?
    b. if shut down standbys first (might be preferred if we run cluster under Patroni control, to avoid unnecessary failover), then some changes from the primary won't be received by standbys – and we do have standby corruption risks 
2. For replication based on WAL shipping, I don't think we can guarantee that all changes are propagated to standbys.

The docs also have this:

> 9. Prepare for standby server upgrades 
> If you are upgrading standby servers using methods outlined in section Step 11, verify that the old standby servers are caught up by running pg_controldata against the old primary and standby clusters. Verify that the “Latest checkpoint location” values match in all clusters. (There will be a mismatch if old standby servers were shut down before the old primary or if the old standby servers are still running.) Also, make sure wal_level is not set to minimal in the postgresql.conf file on the new primary cluster.

– admitting that there might be mismatch. But if there is mismatch, rsync --size-only is not going to help synchronize properly, right?

I was thinking about how to improve here, some ideas:
- "rsync --checksum" doesn't seem to be a good idea, it's, unfortunately, very, very slow, though it would be the most reliable approach (but since it's slow, I guess it's not worth even mentioning, crossing this out)
- we could remove "--size-only" and rely on default rsync behavior – checking size and modification time; but how reliable would it be in general case?
- make the step verifying “Latest checkpoint location” *after* shutting down all nodes as mandatory, with instructions on how to avoid mismatch: e.g., shut down primary first, disabling automated failover software, if any, then run pg_controldata on standbys while they are running, and on primary while it's already shut down (probably, different instructions are needed for WAL shipping and streaming cases) 
- probably, we should always run "rsync --checksum" for pg_wal 
- I think, it's time to provide a snippet to run "rsync" in multiple threads. A lot of installations today have many vCPUs and fast SSDs, and running single-threaded rsync seems to be very slow (especially if we do need to move away from "--size-only"). If it makes sense, I could come up with some patch proposal for the docs
-  it's probably time to implement support for standby upgrade in pg_upgrade itself, finding some way to take care of standbys and moving away from the need to run rsync or to rebuild standby nodes? Although, this is just a raw idea without a proper proposal yet.

Does this make sense or I'm missing something and the current docs describe a reliable process? (As I said, we have deviated from the process, to involve logical replication, so I'm not 100% sure I'm right suspecting the original procedure in having standby corruption risks.)

Thanks,
Nikolay Samokhvalov
Founder, Postgres.ai
On Thu, Jun 29, 2023 at 1:50 PM Nikolay Samokhvalov <nik@postgres.ai> wrote:
> Does this make sense or I'm missing something and the current docs describe a reliable process? (As I said, we have
deviatedfrom the process, to involve logical replication, so I'm not 100% sure I'm right suspecting the original
procedurein having standby corruption risks.) 

I'm very suspicious about this section of the documentation. It
doesn't explain why --size-only is used or why --no-inc-recursive is
used.

> > 9. Prepare for standby server upgrades
> > If you are upgrading standby servers using methods outlined in section Step 11, verify that the old standby servers
arecaught up by running pg_controldata against the old primary and standby clusters. Verify that the “Latest checkpoint
location”values match in all clusters. (There will be a mismatch if old standby servers were shut down before the old
primaryor if the old standby servers are still running.) Also, make sure wal_level is not set to minimal in the
postgresql.conffile on the new primary cluster. 
>
> – admitting that there might be mismatch. But if there is mismatch, rsync --size-only is not going to help
synchronizeproperly, right? 

I think the idea is that you shouldn't use the procedure in this case.
But honestly I don't think it's probably a good idea to use this
procedure at all. It's not clear enough under what circumstances, if
any, it's safe to use, and there's not really any way to know if
you've done it correctly. You couldn't pay me enough to recommend this
procedure to anyone.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Thu, Jun 29, 2023 at 02:38:58PM -0400, Robert Haas wrote:
> On Thu, Jun 29, 2023 at 1:50 PM Nikolay Samokhvalov <nik@postgres.ai> wrote:
> > Does this make sense or I'm missing something and the current docs describe a reliable process? (As I said, we have
deviatedfrom the process, to involve logical replication, so I'm not 100% sure I'm right suspecting the original
procedurein having standby corruption risks.)
 
> 
> I'm very suspicious about this section of the documentation. It
> doesn't explain why --size-only is used or why --no-inc-recursive is
> used.

I think --size-only was chosen only because it is the minimal comparison
option.
 
> > > 9. Prepare for standby server upgrades
> > > If you are upgrading standby servers using methods outlined in section Step 11, verify that the old standby
serversare caught up by running pg_controldata against the old primary and standby clusters. Verify that the “Latest
checkpointlocation” values match in all clusters. (There will be a mismatch if old standby servers were shut down
beforethe old primary or if the old standby servers are still running.) Also, make sure wal_level is not set to minimal
inthe postgresql.conf file on the new primary cluster.
 
> >
> > – admitting that there might be mismatch. But if there is mismatch, rsync --size-only is not going to help
synchronizeproperly, right?
 
> 
> I think the idea is that you shouldn't use the procedure in this case.
> But honestly I don't think it's probably a good idea to use this
> procedure at all. It's not clear enough under what circumstances, if
> any, it's safe to use, and there's not really any way to know if
> you've done it correctly. You couldn't pay me enough to recommend this
> procedure to anyone.

I think it would be good to revisit all the steps outlined in that
procedure and check which ones are still valid or need adjusting.  It is
very possible the original steps have bugs or that new Postgres features
added since the steps were created don't work with these steps.  I think
we need to bring Stephen Frost into the discussion, so I have CC'ed him.

Frankly, I didn't think the documented procedure would work either, but
people say it does, so it is in the docs.  I do think it is overdue for
a re-analysis.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



On Fri, Jun 30, 2023 at 1:41 PM Bruce Momjian <bruce@momjian.us> wrote:
> I think --size-only was chosen only because it is the minimal comparison
> option.

I think it's worse than that. I think that the procedure relies on
using the --size-only option to intentionally trick rsync into
thinking that files are identical when they're not.

Say we have a file like base/23246/78901 on the primary. Unless
wal_log_hints=on, the standby version is very likely different, but
only in ways that don't matter to WAL replay. So the procedure aims to
trick rsync into hard-linking the version of that file that exists on
the standby in the old cluster into the new cluster on the standby,
instead of copying the slightly-different version from the master,
thus making the upgrade very fast. If rsync actually checksummed the
files, it would realize that they're different and copy the file from
the original primary, which the person who wrote this procedure does
not want.

That's kind of a crazy thing for us to be documenting. I think we
really ought to consider removing from this documentation. If somebody
wants to write a reliable tool for this to ship as part of PostgreSQL,
well and good. But this procedure has no real sanity checks and is
based on very fragile assumptions. That doesn't seem suitable for
end-user use.

I'm not quite clear on how Nikolay got into trouble here. I don't
think I understand under exactly what conditions the procedure is
reliable and under what conditions it isn't. But there is no way in
heck I would ever advise anyone to use this procedure on a database
they actually care about. This is a great party trick or something to
show off in a lightning talk at PGCon, not something you ought to be
doing with valuable data that you actually care about.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Fri, Jun 30, 2023 at 04:16:31PM -0400, Robert Haas wrote:
> On Fri, Jun 30, 2023 at 1:41 PM Bruce Momjian <bruce@momjian.us> wrote:
> > I think --size-only was chosen only because it is the minimal comparison
> > option.
> 
> I think it's worse than that. I think that the procedure relies on
> using the --size-only option to intentionally trick rsync into
> thinking that files are identical when they're not.
> 
> Say we have a file like base/23246/78901 on the primary. Unless
> wal_log_hints=on, the standby version is very likely different, but
> only in ways that don't matter to WAL replay. So the procedure aims to
> trick rsync into hard-linking the version of that file that exists on
> the standby in the old cluster into the new cluster on the standby,
> instead of copying the slightly-different version from the master,
> thus making the upgrade very fast. If rsync actually checksummed the
> files, it would realize that they're different and copy the file from
> the original primary, which the person who wrote this procedure does
> not want.

What is the problem with having different hint bits between the two
servers?

> That's kind of a crazy thing for us to be documenting. I think we
> really ought to consider removing from this documentation. If somebody
> wants to write a reliable tool for this to ship as part of PostgreSQL,
> well and good. But this procedure has no real sanity checks and is
> based on very fragile assumptions. That doesn't seem suitable for
> end-user use.
> 
> I'm not quite clear on how Nikolay got into trouble here. I don't
> think I understand under exactly what conditions the procedure is
> reliable and under what conditions it isn't. But there is no way in
> heck I would ever advise anyone to use this procedure on a database
> they actually care about. This is a great party trick or something to
> show off in a lightning talk at PGCon, not something you ought to be
> doing with valuable data that you actually care about.

Well, it does get used, and if we remove it perhaps we can have it on
our wiki and point to it from our docs.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.





On Fri, Jun 30, 2023 at 14:33 Bruce Momjian <bruce@momjian.us> wrote:
On Fri, Jun 30, 2023 at 04:16:31PM -0400, Robert Haas wrote:   
> I'm not quite clear on how Nikolay got into trouble here. I don't
> think I understand under exactly what conditions the procedure is
> reliable and under what conditions it isn't. But there is no way in
> heck I would ever advise anyone to use this procedure on a database
> they actually care about. This is a great party trick or something to
> show off in a lightning talk at PGCon, not something you ought to be
> doing with valuable data that you actually care about.

Well, it does get used, and if we remove it perhaps we can have it on
our wiki and point to it from our docs.

In my case, we performed some additional writes on the primary before running "pg_upgrade -k" and we did it *after* we shut down all the standbys. So those changes were not replicated and then "rsync --size-only" ignored them. (By the way, that cluster has wal_log_hints=on to allow Patroni run pg_rewind when needed.)

But this can happen with anyone who follows the procedure from the docs as is and doesn't do any additional steps, because in step 9 "Prepare for standby server upgrades":

1) there is no requirement to follow specific order to shut down the nodes
   - "Streaming replication and log-shipping standby servers can remain running until a later step" should probably be changed to a requirement-like "keep them running"

2) checking the latest checkpoint position with pg_controldata now looks like a thing that is good to do, but with uncertainty purpose -- it does not seem to be used to support any decision
  - "There will be a mismatch if old standby servers were shut down before the old primary or if the old standby servers are still running" should probably be rephrased saying that if there is mismatch, it's a big problem

So following the steps as is, if some writes on the primary are not replicated (due to whatever reason) before execution of pg_upgrade -k + rsync --size-only, then those writes are going to be silently lost on standbys.

I wonder, if we ensure that standbys are fully caught up before upgrading the primary, if we check the latest checkpoint positions, are we good to use "rsync --size-only", or there are still some concerns? It seems so to me, but maybe I'm missing something.
--

Thanks,
Nikolay Samokhvalov
Founder, Postgres.ai
On Fri, Jun 30, 2023 at 03:18:03PM -0700, Nikolay Samokhvalov wrote:
> I wonder, if we ensure that standbys are fully caught up before upgrading the
> primary, if we check the latest checkpoint positions, are we good to use "rsync
> --size-only", or there are still some concerns? It seems so to me, but maybe
> I'm missing something.

Yes, I think you are correct.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



Fast upgrade of highly available cluster is a vital part of being industry-acceptable solution for any data management system. Because the cluster is required to be highly available.

Without this documented technique upgrade of 1Tb cluster would last many hours, not seconds.
There are industry concerns about scalability beyond tens of terabytes per cluster, but such downtime would significantly lower that boundary.

On 1 Jul 2023, at 01:16, Robert Haas <robertmhaas@gmail.com> wrote:

 If somebody
wants to write a reliable tool for this to ship as part of PostgreSQL,
well and good.

IMV that's a good idea. We could teach pg_upgrade or some new tool to do that reliably. The tricky part is that the tool must stop-start standby remotely...


Best regards, Andrey Borodin.
Greetings,

* Nikolay Samokhvalov (nik@postgres.ai) wrote:
> On Fri, Jun 30, 2023 at 14:33 Bruce Momjian <bruce@momjian.us> wrote:
> > On Fri, Jun 30, 2023 at 04:16:31PM -0400, Robert Haas wrote:
> > > I'm not quite clear on how Nikolay got into trouble here. I don't
> > > think I understand under exactly what conditions the procedure is
> > > reliable and under what conditions it isn't. But there is no way in
> > > heck I would ever advise anyone to use this procedure on a database
> > > they actually care about. This is a great party trick or something to
> > > show off in a lightning talk at PGCon, not something you ought to be
> > > doing with valuable data that you actually care about.
> >
> > Well, it does get used, and if we remove it perhaps we can have it on
> > our wiki and point to it from our docs.

I was never a fan of having it actually documented because it's a pretty
complex and involved process that really requires someone doing it have
a strong understanding of how PG works.

> In my case, we performed some additional writes on the primary before
> running "pg_upgrade -k" and we did it *after* we shut down all the
> standbys. So those changes were not replicated and then "rsync --size-only"
> ignored them. (By the way, that cluster has wal_log_hints=on to allow
> Patroni run pg_rewind when needed.)

That's certainly going to cause problems..

> But this can happen with anyone who follows the procedure from the docs as
> is and doesn't do any additional steps, because in step 9 "Prepare for
> standby server upgrades":
>
> 1) there is no requirement to follow specific order to shut down the nodes
>    - "Streaming replication and log-shipping standby servers can remain
> running until a later step" should probably be changed to a
> requirement-like "keep them running"

Agreed that it would be good to clarify that the primary should be shut
down first, to make sure everything written by the primary has been
replicated to all of the replicas.

> 2) checking the latest checkpoint position with pg_controldata now looks
> like a thing that is good to do, but with uncertainty purpose -- it does
> not seem to be used to support any decision
>   - "There will be a mismatch if old standby servers were shut down before
> the old primary or if the old standby servers are still running" should
> probably be rephrased saying that if there is mismatch, it's a big problem

Yes, it's absolutely a big problem and that's the point of the check.
Slightly surprised that we need to explicitly say "if they don't match
then you need to figure out what you did wrong and don't move forward
until you get everything shut down and with matching values", but that's
also why it isn't a great idea to try and do this without a solid
understanding of how PG works.

> So following the steps as is, if some writes on the primary are not
> replicated (due to whatever reason) before execution of pg_upgrade -k +
> rsync --size-only, then those writes are going to be silently lost on
> standbys.

Yup.

> I wonder, if we ensure that standbys are fully caught up before upgrading
> the primary, if we check the latest checkpoint positions, are we good to
> use "rsync --size-only", or there are still some concerns? It seems so to
> me, but maybe I'm missing something.

I've seen a lot of success with it.

Ultimately, when I was describing this process, it was always with the
idea that it would be performed by someone quite familiar with the
internals of PG or, ideally, could be an outline of how an interested PG
hacker could write a tool to do it.  Hard to say, but I do feel like
having it documented has actually reduced the interest in writing a tool
to do it, which, if that's the case, is quite unfortunate.

Thanks,

Stephen

Attachment
On Fri, Jul 7, 2023 at 6:31 AM Stephen Frost <sfrost@snowman.net> wrote:
* Nikolay Samokhvalov (nik@postgres.ai) wrote:
> But this can happen with anyone who follows the procedure from the docs as
> is and doesn't do any additional steps, because in step 9 "Prepare for
> standby server upgrades":
>
> 1) there is no requirement to follow specific order to shut down the nodes
>    - "Streaming replication and log-shipping standby servers can remain
> running until a later step" should probably be changed to a
> requirement-like "keep them running"

Agreed that it would be good to clarify that the primary should be shut
down first, to make sure everything written by the primary has been
replicated to all of the replicas.

Thanks!

Here is a patch to fix the existing procedure description.

I agree with Andrey – without it, we don't have any good way to upgrade large clusters in short time. Default rsync mode (without "--size-only") takes a lot of time too, if the load is heavy.

With these adjustments, can "rsync --size-only" remain in the docs as the *fast* and safe method to upgrade standbys, or there are still some concerns related to corruption risks?
Attachment
Hi,

On Mon, Jul 10, 2023 at 01:36:39PM -0700, Nikolay Samokhvalov wrote:
> On Fri, Jul 7, 2023 at 6:31 AM Stephen Frost <sfrost@snowman.net> wrote:
> > * Nikolay Samokhvalov (nik@postgres.ai) wrote:
> > > But this can happen with anyone who follows the procedure from the docs
> > as
> > > is and doesn't do any additional steps, because in step 9 "Prepare for
> > > standby server upgrades":
> > >
> > > 1) there is no requirement to follow specific order to shut down the
> > nodes
> > >    - "Streaming replication and log-shipping standby servers can remain
> > > running until a later step" should probably be changed to a
> > > requirement-like "keep them running"
> >
> > Agreed that it would be good to clarify that the primary should be shut
> > down first, to make sure everything written by the primary has been
> > replicated to all of the replicas.
> 
> Thanks!
> 
> Here is a patch to fix the existing procedure description.

Thanks for that!

> I agree with Andrey – without it, we don't have any good way to upgrade
> large clusters in short time. Default rsync mode (without "--size-only")
> takes a lot of time too, if the load is heavy.
> 
> With these adjustments, can "rsync --size-only" remain in the docs as the
> *fast* and safe method to upgrade standbys, or there are still some
> concerns related to corruption risks?

I hope somebody can answer that definitively, but I read Stephen's mail
to indicate that this procedure should be safe in principle (if you know
what you are doing).

> From: Nikolay Samokhvalov <nik@postgres.ai>
> Date: Mon, 10 Jul 2023 20:07:18 +0000
> Subject: [PATCH] Improve major upgrade docs

Maybe mention standby here, like "Improve major upgrade documentation
for standby servers".

> +++ b/doc/src/sgml/ref/pgupgrade.sgml
> @@ -380,22 +380,28 @@ NET STOP postgresql-&majorversion;
>      </para>
>  
>      <para>
> -     Streaming replication and log-shipping standby servers can
> +     Streaming replication and log-shipping standby servers must
>       remain running until a later step.
>      </para>
>     </step>
>  
> -   <step>
> +   <step id="pgupgrade-step-prepare-standbys">
>
>      <para>
> -     If you are upgrading standby servers using methods outlined in section <xref
> -     linkend="pgupgrade-step-replicas"/>, verify that the old standby
> -     servers are caught up by running <application>pg_controldata</application>
> -     against the old primary and standby clusters.  Verify that the
> -     <quote>Latest checkpoint location</quote> values match in all clusters.
> -     (There will be a mismatch if old standby servers were shut down
> -     before the old primary or if the old standby servers are still running.)
> +     If you are upgrading standby servers using methods outlined in 
> +     <xref linkend="pgupgrade-step-replicas"/>, 

You dropped the "section" before the xref, I think that should be kept
around.

> +                                                ensure that they were running when 
> +     you shut down the primaries in the previous step, so all the latest changes 

You talk of primaries in plural here, that is a bit weird for PostgreSQL
documentation.

> +     and the shutdown checkpoint record were received. You can verify this by running 
> +     <application>pg_controldata</application> against the old primary and standby 
> +     clusters. The <quote>Latest checkpoint location</quote> values must match in all 
> +     nodes. A mismatch might occur if old standby servers were shut down before 
> +     the old primary. To fix a mismatch, start all old servers and return to the 
> +     previous step; proceeding with mismatched 
> +     <quote>Latest checkpoint location</quote> may lead to standby corruption.
> +    </para>
> +
> +    <para>
>       Also, make sure <varname>wal_level</varname> is not set to
>       <literal>minimal</literal> in the <filename>postgresql.conf</filename> file on the
>       new primary cluster.
> @@ -497,7 +503,6 @@ pg_upgrade.exe
>       linkend="warm-standby"/>) standby servers, you can follow these steps to
>       quickly upgrade them.  You will not be running <application>pg_upgrade</application> on
>       the standby servers, but rather <application>rsync</application> on the primary.
> -     Do not start any servers yet.
>      </para>
>  
>      <para>
> @@ -508,6 +513,15 @@ pg_upgrade.exe
>       is running.
>      </para>
>  
> +    <para>
> +     Before running rsync, to avoid standby corruption, it is absolutely
> +     critical to ensure that both primaries are shut down and standbys 
> +     have received the last changes (see <xref linkend="pgupgrade-step-prepare-standbys"/>). 

I think this should be something like "ensure both that the primary is
shut down and that the standbys have received all the changes".

> +     Standbys can be running at this point or fully stopped.

"or be fully stopped." I think.

> +                                                             If they 
> +     are still running, you can stop, upgrade, and start them one by one; this
> +     can be useful to keep the cluster open for read-only transactions.
> +    </para>

Maybe this is clear from the context, but "upgrade" in the above should
maybe more explicitly refer to the rsync method or else people might
think one can run pg_upgrade on them after all?


Michael



On Mon, Jul 10, 2023 at 2:02 PM Michael Banck <mbanck@gmx.net> wrote:
Thanks for that!

Thanks for the fast review.
 

> I agree with Andrey – without it, we don't have any good way to upgrade
> large clusters in short time. Default rsync mode (without "--size-only")
> takes a lot of time too, if the load is heavy.
>
> With these adjustments, can "rsync --size-only" remain in the docs as the
> *fast* and safe method to upgrade standbys, or there are still some
> concerns related to corruption risks?

I hope somebody can answer that definitively, but I read Stephen's mail
to indicate that this procedure should be safe in principle (if you know
what you are doing).

right, this is my understanding too


> +++ b/doc/src/sgml/ref/pgupgrade.sgml
> @@ -380,22 +380,28 @@ NET STOP postgresql-&majorversion;
>      </para>

>      <para>
> -     Streaming replication and log-shipping standby servers can
> +     Streaming replication and log-shipping standby servers must
>       remain running until a later step.
>      </para>
>     </step>

> -   <step>
> +   <step id="pgupgrade-step-prepare-standbys">
>
>      <para>
> -     If you are upgrading standby servers using methods outlined in section <xref
> -     linkend="pgupgrade-step-replicas"/>, verify that the old standby
> -     servers are caught up by running <application>pg_controldata</application>
> -     against the old primary and standby clusters.  Verify that the
> -     <quote>Latest checkpoint location</quote> values match in all clusters.
> -     (There will be a mismatch if old standby servers were shut down
> -     before the old primary or if the old standby servers are still running.)
> +     If you are upgrading standby servers using methods outlined in
> +     <xref linkend="pgupgrade-step-replicas"/>,

You dropped the "section" before the xref, I think that should be kept
around.

Seems to be a problem in discussing source code that looks quite different than the final result.

In the result – the docs – we currently have "section Step 9", looking weird. I still think it's good to remove it. We also have "in Step 17 below" (without the extra word "section") in a different place on the same page.
 

> +                                                ensure that they were running when
> +     you shut down the primaries in the previous step, so all the latest changes

You talk of primaries in plural here, that is a bit weird for PostgreSQL
documentation.

The same docs already discuss two primaries ("8. Stop both primaries"), but I agree it might look confusing if you read only a part of the doc, jumping into middle of it, like I do all the time when using the docs in "check the reference" style.

I agree with this comment, but it tells me we need even more improvements of this doc, beyond my original goal – e.g., I don't like section 8 saying "Make sure both database servers", it should be "both primaries".
 

> +     and the shutdown checkpoint record were received. You can verify this by running
> +     <application>pg_controldata</application> against the old primary and standby
> +     clusters. The <quote>Latest checkpoint location</quote> values must match in all
> +     nodes. A mismatch might occur if old standby servers were shut down before
> +     the old primary. To fix a mismatch, start all old servers and return to the
> +     previous step; proceeding with mismatched
> +     <quote>Latest checkpoint location</quote> may lead to standby corruption.
> +    </para>
> +
> +    <para>
>       Also, make sure <varname>wal_level</varname> is not set to
>       <literal>minimal</literal> in the <filename>postgresql.conf</filename> file on the
>       new primary cluster.
> @@ -497,7 +503,6 @@ pg_upgrade.exe
>       linkend="warm-standby"/>) standby servers, you can follow these steps to
>       quickly upgrade them.  You will not be running <application>pg_upgrade</application> on
>       the standby servers, but rather <application>rsync</application> on the primary.
> -     Do not start any servers yet.
>      </para>

>      <para>
> @@ -508,6 +513,15 @@ pg_upgrade.exe
>       is running.
>      </para>

> +    <para>
> +     Before running rsync, to avoid standby corruption, it is absolutely
> +     critical to ensure that both primaries are shut down and standbys
> +     have received the last changes (see <xref linkend="pgupgrade-step-prepare-standbys"/>).

I think this should be something like "ensure both that the primary is
shut down and that the standbys have received all the changes".

Well, we have two primary servers – old and new. I tried to clarify it in the new version.
 

> +     Standbys can be running at this point or fully stopped.

"or be fully stopped." I think.

> +                                                             If they
> +     are still running, you can stop, upgrade, and start them one by one; this
> +     can be useful to keep the cluster open for read-only transactions.
> +    </para>

Maybe this is clear from the context, but "upgrade" in the above should
maybe more explicitly refer to the rsync method or else people might
think one can run pg_upgrade on them after all?

Maybe. It will require changes in other parts of this doc.

Meanwhile, attached is v2

thanks for the comments
Attachment
Hi,

On Mon, Jul 10, 2023 at 02:37:24PM -0700, Nikolay Samokhvalov wrote:
> On Mon, Jul 10, 2023 at 2:02 PM Michael Banck <mbanck@gmx.net> wrote:
> > > +++ b/doc/src/sgml/ref/pgupgrade.sgml
> > > @@ -380,22 +380,28 @@ NET STOP postgresql-&majorversion;
> > >      </para>
> > >
> > >      <para>
> > > -     Streaming replication and log-shipping standby servers can
> > > +     Streaming replication and log-shipping standby servers must
> > >       remain running until a later step.
> > >      </para>
> > >     </step>
> > >
> > > -   <step>
> > > +   <step id="pgupgrade-step-prepare-standbys">
> > >
> > >      <para>
> > > -     If you are upgrading standby servers using methods outlined in
> > section <xref
> > > -     linkend="pgupgrade-step-replicas"/>, verify that the old standby
> > > -     servers are caught up by running
> > <application>pg_controldata</application>
> > > -     against the old primary and standby clusters.  Verify that the
> > > -     <quote>Latest checkpoint location</quote> values match in all
> > clusters.
> > > -     (There will be a mismatch if old standby servers were shut down
> > > -     before the old primary or if the old standby servers are still
> > running.)
> > > +     If you are upgrading standby servers using methods outlined in
> > > +     <xref linkend="pgupgrade-step-replicas"/>,
> >
> > You dropped the "section" before the xref, I think that should be kept
> > around.
> 
> Seems to be a problem in discussing source code that looks quite different
> than the final result.
> 
> In the result – the docs – we currently have "section Step 9", looking
> weird. I still think it's good to remove it. We also have "in Step 17
> below" (without the extra word "section") in a different place on the same
> page.

Ok.
 
> > > +                                                ensure that they were
> > running when
> > > +     you shut down the primaries in the previous step, so all the
> > latest changes
> >
> > You talk of primaries in plural here, that is a bit weird for PostgreSQL
> > documentation.
> 
> The same docs already discuss two primaries ("8. Stop both primaries"), but
> I agree it might look confusing if you read only a part of the doc, jumping
> into middle of it, like I do all the time when using the docs in "check the
> reference" style.

[...]

> > I think this should be something like "ensure both that the primary is
> > shut down and that the standbys have received all the changes".
> 
> Well, we have two primary servers – old and new. I tried to clarify it in
> the new version.

Yeah sorry about that, I think I should have first have coffee and/or
slept over this review before sending it.

Maybe one reason why I was confused is beause I consider a "primary"
more like a full server/VM, not necessarily a database instance (though
one could of course have a primary/seconday pair on the same host).
Possibly "primary instances" or something might be clearer, but I think

I should re-read the whole section first before commenting further.


Michael



On Mon, Jul 10, 2023 at 02:37:24PM -0700, Nikolay Samokhvalov wrote:
> Maybe. It will require changes in other parts of this doc.
> Thinking (here: https://gitlab.com/postgres/postgres/-/merge_requests/18/diffs)
> 
> Meanwhile, attached is v2
> 
> thanks for the comments

I looked over this issue thoroughly and I think I see the cause of the
confusion.  In step 8 we say:

    8. Stop both servers
    Streaming replication and log-shipping standby servers can remain
                                                           ---
    running until a later step.

Of course this has to be "must" and it would be good to explain why,
which I have done in the attached patch.

Secondly, in step 9 we say "verify the LSNs", but have a parenthetical
sentence that explains why they might not match:

    (There will be a mismatch if old standby servers were shut down before
    the old primary or if the old standby servers are still running.)

People might take that to mean that it is okay if this is the reason
they don't match, which is incorrect.  Better to tell them to keep the
streaming replication and log-shipping servers running so we don't need
that sentence.

The instructions are already long so I am hesitant to add more text
without a clear purpose.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Attachment
On Thu, Sep  7, 2023 at 01:52:45PM -0400, Bruce Momjian wrote:
> On Mon, Jul 10, 2023 at 02:37:24PM -0700, Nikolay Samokhvalov wrote:
> > Maybe. It will require changes in other parts of this doc.
> > Thinking (here: https://gitlab.com/postgres/postgres/-/merge_requests/18/diffs)
> > 
> > Meanwhile, attached is v2
> > 
> > thanks for the comments
> 
> I looked over this issue thoroughly and I think I see the cause of the
> confusion.  In step 8 we say:
> 
>     8. Stop both servers
>     Streaming replication and log-shipping standby servers can remain
>                                                            ---
>     running until a later step.
> 
> Of course this has to be "must" and it would be good to explain why,
> which I have done in the attached patch.
> 
> Secondly, in step 9 we say "verify the LSNs", but have a parenthetical
> sentence that explains why they might not match:
> 
>     (There will be a mismatch if old standby servers were shut down before
>     the old primary or if the old standby servers are still running.)
> 
> People might take that to mean that it is okay if this is the reason
> they don't match, which is incorrect.  Better to tell them to keep the
> streaming replication and log-shipping servers running so we don't need
> that sentence.
> 
> The instructions are already long so I am hesitant to add more text
> without a clear purpose.

Patch applied back to PG 11.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.