Thread: PG Upgrade with hardlinks, when to start/stop master and replicas
Hello everyone!
We are about to upgrade a 6 instance cluster from pg92 to pg10 using pg_upgrade with hardlinks and rsync.
Our preliminary tests are working really good so far but on question has popped up that we feel is really critical because it has an important impact on our failover plan.
After reading the pg_upgrade documentation multiple times, it seems that after running pg_upgrade on the primary instance, we can't start it until we run rsync from the primary to the standby. I'm understanding this from the following section in the pg_upgrade manual page.
```
I'm understanding the `any` as primary and standbys.
On the other hand, we've been doing tests that start the primary instance as soon as pg_upgrade is done. This tests have worked perfectly fine so far. We make the rsync call with the primary instance running and the standby can start later on after rsync is done and we copy the new configuration files.
If what we are doing is wrong, we need to run `rsync` before starting the primary instance, that would mean that the primary and the standby are not usable if pg10 doesn't start correctly in the primary right ?
I hope my question is clear enough!
Best,
We are about to upgrade a 6 instance cluster from pg92 to pg10 using pg_upgrade with hardlinks and rsync.
Our preliminary tests are working really good so far but on question has popped up that we feel is really critical because it has an important impact on our failover plan.
After reading the pg_upgrade documentation multiple times, it seems that after running pg_upgrade on the primary instance, we can't start it until we run rsync from the primary to the standby. I'm understanding this from the following section in the pg_upgrade manual page.
```
You will not be running pg_upgrade on the standby servers, but rather rsync on the
primary. Do not start any servers yet.
```I'm understanding the `any` as primary and standbys.
On the other hand, we've been doing tests that start the primary instance as soon as pg_upgrade is done. This tests have worked perfectly fine so far. We make the rsync call with the primary instance running and the standby can start later on after rsync is done and we copy the new configuration files.
If what we are doing is wrong, we need to run `rsync` before starting the primary instance, that would mean that the primary and the standby are not usable if pg10 doesn't start correctly in the primary right ?
I hope my question is clear enough!
Best,
Martín
Martín Fernández wrote: > After reading the pg_upgrade documentation multiple times, it seems that after running pg_upgrade on the primary instance,we can't start it until we run rsync from the primary to the standby. I'm understanding this from the followingsection in the pg_upgrade manual page. > > You will not be running pg_upgrade on the standby servers, but rather rsync on the > primary. Do not start any servers yet. Immediately following, you can read: If you did not use link mode, do not have or do not want to use rsync, or want an easier solution, skip the instructions in this section and simply recreate the standby servers once pg_upgrade completes and the new primary is running. So this is not compulsory, it's just an efficient method to quickly get the standby server updated. There is nothing wrong with rebuilding the standby later. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Hola Martin
Pues si uno sigue la secuencia de la pagina de ayuda de PostgreSQL
Usage
(...)
7. Stop both servers
(...)
10. Upgrade Streaming Replication and Log-Shipping standby servers
(...)
12. Start the new server
The new server can now be safely started, and then any rsync'ed standby servers.
Pensaría que toca esperar a que terminen todo los rsync de las replicas antes de iniciar la master... Pero tengo incluso una pregunta adicional: si tengo 2 o mas replicas.. se podria sincronizar primero la master con una replica (rsync), iniciar la master y luego emplear la replica sincronizada para sincronizar el resto de replicas mientras que la máster ya se encuentra en funcionamiento?.
El lun., 18 de feb. de 2019 a la(s) 15:21, Laurenz Albe (laurenz.albe@cybertec.at) escribió:
Martín Fernández wrote:
> After reading the pg_upgrade documentation multiple times, it seems that after running pg_upgrade on the primary instance, we can't start it until we run rsync from the primary to the standby. I'm understanding this from the following section in the pg_upgrade manual page.
>
> You will not be running pg_upgrade on the standby servers, but rather rsync on the
> primary. Do not start any servers yet.
Immediately following, you can read:
If you did not use link mode, do not have or do not want to use rsync, or want an easier
solution, skip the instructions in this section and simply recreate the standby servers
once pg_upgrade completes and the new primary is running.
So this is not compulsory, it's just an efficient method to quickly get the standby
server updated.
There is nothing wrong with rebuilding the standby later.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Cordialmente,
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate
Laurenz and Hellmuth, replying to both of you!
Thanks for the quick replies BTW!
Thanks for the quick replies BTW!
Martín
On Mon, Feb 18, 2019 at 5:32 PM Hellmuth Vargas <hivs77@gmail.com> wrote:
Hola MartinPues si uno sigue la secuencia de la pagina de ayuda de PostgreSQLUsage(...)7. Stop both servers(...)10. Upgrade Streaming Replication and Log-Shipping standby servers(...)12. Start the new serverThe new server can now be safely started, and then any rsync'ed standby servers.Pensaría que toca esperar a que terminen todo los rsync de las replicas antes de iniciar la master... Pero tengo incluso una pregunta adicional: si tengo 2 o mas replicas.. se podria sincronizar primero la master con una replica (rsync), iniciar la master y luego emplear la replica sincronizada para sincronizar el resto de replicas mientras que la máster ya se encuentra en funcionamiento?.
I asked myself that question as well. I'm trying to basically minimize the amount of standbys that are going to be affected by the upgrade. The documentation states that you can do something similar to what you describe, the only thing that you need to make sure is to no start the standby if you are going to use it as a source of rsync to another standby. Would that make sense ?
El lun., 18 de feb. de 2019 a la(s) 15:21, Laurenz Albe (laurenz.albe@cybertec.at) escribió:Martín Fernández wrote:
> After reading the pg_upgrade documentation multiple times, it seems that after running pg_upgrade on the primary instance, we can't start it until we run rsync from the primary to the standby. I'm understanding this from the following section in the pg_upgrade manual page.
>
> You will not be running pg_upgrade on the standby servers, but rather rsync on the
> primary. Do not start any servers yet.
Immediately following, you can read:
If you did not use link mode, do not have or do not want to use rsync, or want an easier
solution, skip the instructions in this section and simply recreate the standby servers
once pg_upgrade completes and the new primary is running.
So this is not compulsory, it's just an efficient method to quickly get the standby
server updated.
There is nothing wrong with rebuilding the standby later.
I think that by `recreate` here we are talking about pg_basebackup right ? That won't be acceptable because our database would take days to complete. We need to use rsync and leverage the hardlinks.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com--Cordialmente,
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por InternetOracle Database 10g Administrator Certified AssociateEnterpriseDB Certified PostgreSQL 9.3 Associate
Greetings, * Martín Fernández (fmartin91@gmail.com) wrote: > After reading the pg_upgrade documentation multiple times, it seems that after running pg_upgrade on the primary instance,we can't start it until we run rsync from the primary to the standby. I'm understanding this from thefollowing section in the pg_upgrade manual page. > > ``` > You will not be running pg_upgrade on the standby servers, but rather rsync on the > primary. Do not start any servers yet. > ``` > > I'm understanding the `any` as primary and standbys. Yes, that's correct, you shouldn't start up anything yet. > On the other hand, we've been doing tests that start the primary instance as soon as pg_upgrade is done. This tests haveworked perfectly fine so far. We make the rsync call with the primary instance running and the standby can start lateron after rsync is done and we copy the new configuration files. This is like taking an online backup of the primary without actually doing pg_start_backup / pg_stop_backup and following the protocol for that, meaning that the replica will start up without a backup_label and will think it's at whatever point in the WAL stream that the pg_control file says its at as of whenever the rsync copies that file. That is NOT SAFE and it's a sure way to end up with corruption. The rsync while everything is down should be pretty fast, unless you have unlogged tables that are big (in which case, you should truncate them before shutting down the primary) or temporary tables left around (which you should clean up) or just generally other things that a replica doesn't normally have. If you can't have any downtime during this process then, imv, the answer is to build out a new replica that will essentially be a 'throw-away', move all the read load over to it and then go through the documented pg_upgrade process with the primary and the other replicas, then flip the traffic back to the primary + original replicas and then you can either throw away the replica that was kept online or rebuild it using the traditional methods of pg_basebackup (or for a larger system, you could use pgbackrest which can run in parallel and is much, much faster than pg_basebackup). > If what we are doing is wrong, we need to run `rsync` before starting the primary instance, that would mean that the primaryand the standby are not usable if pg10 doesn't start correctly in the primary right ? This is another reason why it's good to have an independent replica, as it can be a fail-safe if things go completely south (you can just promote it and have it be the primary and then rebuild replicas using the regular backup+restore method and figure out what went wrong with the pg10 migration). Thanks! Stephen
Attachment
Hi
But could you do the following procedure?:
pg_upgrade of the master
rysnc with a hot standby
arracar master
hot standby start
stop hot standby and rsync the other hot standby with the migrated hot standby?
El mar., 19 de feb. de 2019 a la(s) 06:12, Stephen Frost (sfrost@snowman.net) escribió:
Greetings,
* Martín Fernández (fmartin91@gmail.com) wrote:
> After reading the pg_upgrade documentation multiple times, it seems that after running pg_upgrade on the primary instance, we can't start it until we run rsync from the primary to the standby. I'm understanding this from the following section in the pg_upgrade manual page.
>
> ```
> You will not be running pg_upgrade on the standby servers, but rather rsync on the
> primary. Do not start any servers yet.
> ```
>
> I'm understanding the `any` as primary and standbys.
Yes, that's correct, you shouldn't start up anything yet.
> On the other hand, we've been doing tests that start the primary instance as soon as pg_upgrade is done. This tests have worked perfectly fine so far. We make the rsync call with the primary instance running and the standby can start later on after rsync is done and we copy the new configuration files.
This is like taking an online backup of the primary without actually
doing pg_start_backup / pg_stop_backup and following the protocol for
that, meaning that the replica will start up without a backup_label and
will think it's at whatever point in the WAL stream that the pg_control
file says its at as of whenever the rsync copies that file.
That is NOT SAFE and it's a sure way to end up with corruption.
The rsync while everything is down should be pretty fast, unless you
have unlogged tables that are big (in which case, you should truncate
them before shutting down the primary) or temporary tables left around
(which you should clean up) or just generally other things that a
replica doesn't normally have.
If you can't have any downtime during this process then, imv, the answer
is to build out a new replica that will essentially be a 'throw-away',
move all the read load over to it and then go through the documented
pg_upgrade process with the primary and the other replicas, then flip
the traffic back to the primary + original replicas and then you can
either throw away the replica that was kept online or rebuild it using
the traditional methods of pg_basebackup (or for a larger system, you
could use pgbackrest which can run in parallel and is much, much faster
than pg_basebackup).
> If what we are doing is wrong, we need to run `rsync` before starting the primary instance, that would mean that the primary and the standby are not usable if pg10 doesn't start correctly in the primary right ?
This is another reason why it's good to have an independent replica, as
it can be a fail-safe if things go completely south (you can just
promote it and have it be the primary and then rebuild replicas using
the regular backup+restore method and figure out what went wrong with
the pg10 migration).
Thanks!
Stephen
Cordialmente,
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate
Greetings, * Hellmuth Vargas (hivs77@gmail.com) wrote: > But could you do the following procedure?: > pg_upgrade of the master > rysnc with a hot standby The above should be alright provided both the primary and the standby are down and the instructions in the pg_upgrade docs are followed. > arracar master > hot standby start So, start both the primary and the replica? That part should be fine by itself. > stop hot standby and rsync the other hot standby with the migrated hot > standby? At some later point, shut down the replica completely, then do an rsync from that replica to the other replica and build its hard-link tree that way, and update anything that's changed while the 'migrated' replica was online? I don't see any obvious issue with that as the result should mean that the two replicas are identical from PG's perspective from that point moving forward. Ultimately, it really depends on your specific environment though, of course. It also might not be a bad idea to do a regular backup of the upgraded primary and then restore that to the second replica, just to make sure you have that whole process working and to test out your restore process. Thanks! Stephen
Attachment
Thank you Stephen 👍
El mar., 19 de feb. de 2019 a la(s) 10:05, Stephen Frost (sfrost@snowman.net) escribió:
Greetings,
* Hellmuth Vargas (hivs77@gmail.com) wrote:
> But could you do the following procedure?:
> pg_upgrade of the master
> rysnc with a hot standby
The above should be alright provided both the primary and the standby
are down and the instructions in the pg_upgrade docs are followed.
> arracar master
> hot standby start
So, start both the primary and the replica? That part should be fine by
itself.
> stop hot standby and rsync the other hot standby with the migrated hot
> standby?
At some later point, shut down the replica completely, then do an rsync
from that replica to the other replica and build its hard-link tree that
way, and update anything that's changed while the 'migrated' replica was
online? I don't see any obvious issue with that as the result should
mean that the two replicas are identical from PG's perspective from that
point moving forward.
Ultimately, it really depends on your specific environment though, of
course. It also might not be a bad idea to do a regular backup of the
upgraded primary and then restore that to the second replica, just to
make sure you have that whole process working and to test out your
restore process.
Thanks!
Stephen
Cordialmente,
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por Internet
Oracle Database 10g Administrator Certified Associate
EnterpriseDB Certified PostgreSQL 9.3 Associate
Stephen,
Thanks for information! I've refactor our migration scripts to follow the suggestions.
One extra question that popped up. As long as we don't start the standby (after running rsync), we can always `rm -f $PGDATA_10` and promote the standby if necessary for failover right ? We also need to `mv` pg_control.old to pg_control in the old data directory.
Thanks!
Thanks for information! I've refactor our migration scripts to follow the suggestions.
One extra question that popped up. As long as we don't start the standby (after running rsync), we can always `rm -f $PGDATA_10` and promote the standby if necessary for failover right ? We also need to `mv` pg_control.old to pg_control in the old data directory.
Thanks!
Martín
On Tue, Feb 19th, 2019 at 12:39 PM, Hellmuth Vargas <hivs77@gmail.com> wrote:
Thank you Stephen 👍El mar., 19 de feb. de 2019 a la(s) 10:05, Stephen Frost (sfrost@snowman.net) escribió:Greetings,
* Hellmuth Vargas (hivs77@gmail.com) wrote:
> But could you do the following procedure?:
> pg_upgrade of the master
> rysnc with a hot standby
The above should be alright provided both the primary and the standby
are down and the instructions in the pg_upgrade docs are followed.
> arracar master
> hot standby start
So, start both the primary and the replica? That part should be fine by
itself.
> stop hot standby and rsync the other hot standby with the migrated hot
> standby?
At some later point, shut down the replica completely, then do an rsync
from that replica to the other replica and build its hard-link tree that
way, and update anything that's changed while the 'migrated' replica was
online? I don't see any obvious issue with that as the result should
mean that the two replicas are identical from PG's perspective from that
point moving forward.
Ultimately, it really depends on your specific environment though, of
course. It also might not be a bad idea to do a regular backup of the
upgraded primary and then restore that to the second replica, just to
make sure you have that whole process working and to test out your
restore process.
Thanks!
Stephen--Cordialmente,
Ing. Hellmuth I. Vargas S.
Esp. Telemática y Negocios por InternetOracle Database 10g Administrator Certified AssociateEnterpriseDB Certified PostgreSQL 9.3 Associate
Greetings, * Martín Fernández (fmartin91@gmail.com) wrote: > Thanks for information! I've refactor our migration scripts to follow the suggestions. Please don't top-post on these mailing lists. > One extra question that popped up. As long as we don't start the standby (after running rsync), we can always `rm -f $PGDATA_10`and promote the standby if necessary for failover right ? We also need to `mv` pg_control.old to pg_control inthe old data directory. Not sure which standby we're talking about here, but in general, yes, as long as you haven't actually started the system after the pg_upgrade/rsync, you should be able to blow away the new cluster that pg_upgrade/rsync created and start the old cluster back up and promote it (if necessary) and use it. Note that you should *not* need to do anything with pg_control, I have no idea what you're referring to there, but the old cluster should have the pg_control file and all the catalog tables in place from before the pg_upgrade/rsync (those aren't touched during the pg_upgrade/rsync process) and you would just need to start up the old binaries pointing at the old PG data directory and everything should just work. Thanks! Stephen
Attachment
Stephen,
@bilby91
On Tue, Feb 19, 2019 at 1:37 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Martín Fernández (fmartin91@gmail.com) wrote:
> Thanks for information! I've refactor our migration scripts to follow the suggestions.
Please don't top-post on these mailing lists.
> One extra question that popped up. As long as we don't start the standby (after running rsync), we can always `rm -f $PGDATA_10` and promote the standby if necessary for failover right ? We also need to `mv` pg_control.old to pg_control in the old data directory.
Not sure which standby we're talking about here, but in general, yes, as
long as you haven't actually started the system after the
pg_upgrade/rsync, you should be able to blow away the new cluster that
pg_upgrade/rsync created and start the old cluster back up and promote
it (if necessary) and use it.
Note that you should *not* need to do anything with pg_control, I have
no idea what you're referring to there, but the old cluster should have
the pg_control file and all the catalog tables in place from before the
pg_upgrade/rsync (those aren't touched during the pg_upgrade/rsync
process) and you would just need to start up the old binaries pointing
at the old PG data directory and everything should just work.
I did some successful tests yesterday around this scenario. That standby in this context is that one that received the rsync from the master but was never started. The old data directory stays intact except for the fact that globa/pg_control was renmaed with a .old
I have found the documentation on pg_ugprade that states this:
` If you ran pg_upgrade without --link or did not start the new server, the old cluster was not modified except that, if linking started, a .old suffix was appended to
$PGDATA/global/pg_control. To reuse the old cluster, possibly remove the .old suffix from $PGDATA/global/pg_control; you can then restart the old cluster.`
Thanks!
Stephen
Greetings, * Martín Fernández (fmartin91@gmail.com) wrote: > On Tue, Feb 19, 2019 at 1:37 PM Stephen Frost <sfrost@snowman.net> wrote: > > * Martín Fernández (fmartin91@gmail.com) wrote: > > > Thanks for information! I've refactor our migration scripts to follow > > the suggestions. > > > > Please don't top-post on these mailing lists. > > > > > One extra question that popped up. As long as we don't start the standby > > (after running rsync), we can always `rm -f $PGDATA_10` and promote the > > standby if necessary for failover right ? We also need to `mv` > > pg_control.old to pg_control in the old data directory. > > > > Not sure which standby we're talking about here, but in general, yes, as > > long as you haven't actually started the system after the > > pg_upgrade/rsync, you should be able to blow away the new cluster that > > pg_upgrade/rsync created and start the old cluster back up and promote > > it (if necessary) and use it. > > > > Note that you should *not* need to do anything with pg_control, I have > > no idea what you're referring to there, but the old cluster should have > > the pg_control file and all the catalog tables in place from before the > > pg_upgrade/rsync (those aren't touched during the pg_upgrade/rsync > > process) and you would just need to start up the old binaries pointing > > at the old PG data directory and everything should just work. > > > I did some successful tests yesterday around this scenario. That standby in > this context is that one that received the rsync from the master but was > never started. The old data directory stays intact except for the fact that > globa/pg_control was renmaed with a .old > > I have found the documentation on pg_ugprade that states this: > > ` If you ran pg_upgrade without --link or did not start the new server, the > old cluster was not modified except that, if linking started, a .old suffix > was appended to > $PGDATA/global/pg_control. To reuse the old cluster, > possibly remove the .old suffix from $PGDATA/global/pg_control; you can > then restart the old cluster.` Ah, right, I forgot that it did that, fair enough. I've never been thrilled with that particular approach due to the inherent risks of people messing directly with files like pg_control, but that's how it is for now. Thanks! Stephen
Attachment
On Tue, Feb 19, 2019 at 12:25:24PM -0500, Stephen Frost wrote: > Ah, right, I forgot that it did that, fair enough. > > I've never been thrilled with that particular approach due to the > inherent risks of people messing directly with files like pg_control, > but that's how it is for now. There was too much concern that users would accidentally start the old server at some later point, and its files would be hard linked to the new live server, leading to disaster. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Greetings, * Bruce Momjian (bruce@momjian.us) wrote: > On Tue, Feb 19, 2019 at 12:25:24PM -0500, Stephen Frost wrote: > > Ah, right, I forgot that it did that, fair enough. > > > > I've never been thrilled with that particular approach due to the > > inherent risks of people messing directly with files like pg_control, > > but that's how it is for now. > > There was too much concern that users would accidentally start the old > server at some later point, and its files would be hard linked to the > new live server, leading to disaster. Sure, I understand that concern, just wish there was a better approach we could use for "DO NOT START THIS SERVER" rather than moving of the pg_control file. Thanks! Stephen
Attachment
On Thu, Feb 21, 2019 at 09:31:32PM -0500, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (bruce@momjian.us) wrote: > > On Tue, Feb 19, 2019 at 12:25:24PM -0500, Stephen Frost wrote: > > > Ah, right, I forgot that it did that, fair enough. > > > > > > I've never been thrilled with that particular approach due to the > > > inherent risks of people messing directly with files like pg_control, > > > but that's how it is for now. > > > > There was too much concern that users would accidentally start the old > > server at some later point, and its files would be hard linked to the > > new live server, leading to disaster. > > Sure, I understand that concern, just wish there was a better approach > we could use for "DO NOT START THIS SERVER" rather than moving of the > pg_control file. As ugly as it is, I have never heard of a better solution. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Bruce Momjian <bruce@momjian.us> writes: > On Thu, Feb 21, 2019 at 09:31:32PM -0500, Stephen Frost wrote: >> * Bruce Momjian (bruce@momjian.us) wrote: >>> There was too much concern that users would accidentally start the old >>> server at some later point, and its files would be hard linked to the >>> new live server, leading to disaster. >> Sure, I understand that concern, just wish there was a better approach >> we could use for "DO NOT START THIS SERVER" rather than moving of the >> pg_control file. > As ugly as it is, I have never heard of a better solution. system("rm -rf $OLDPGDATA") ... nah, that is not a better idea. regards, tom lane
On Fri, Feb 22, 2019 at 2:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Bruce Momjian <bruce@momjian.us> writes:
> On Thu, Feb 21, 2019 at 09:31:32PM -0500, Stephen Frost wrote:
>> * Bruce Momjian (bruce@momjian.us) wrote:
>>> There was too much concern that users would accidentally start the old
>>> server at some later point, and its files would be hard linked to the
>>> new live server, leading to disaster.
I think this is a great solution. Knowing that neither a human nor a supervisor can mess up the hardlinks is something I really appreciate.
>> Sure, I understand that concern, just wish there was a better approach
>> we could use for "DO NOT START THIS SERVER" rather than moving of the
>> pg_control file.
> As ugly as it is, I have never heard of a better solution.
system("rm -rf $OLDPGDATA") ... nah, that is not a better idea.
regards, tom lane