Thread: [PG_UPGRADE] 9.6 to 10.5
Hi,
I tried to upgrade my PGSQL cluster from 9.6 to 10.5 version through pg_upgrade tool.
I followed the actions noticed in the documentation (https://www.postgresql.org/docs/10/static/pgupgrade.html) and those noticed in the man page of pg_upgrade.
My both clusters (source and target) have been closed with pg_ctl –D /DATA/CLS/clsdata stop commands.
When I try to upgrade my cluster with the following command :
pg_upgrade --old-datadir "/DATA/PPEM/clsdata" --new-datadir "/DATA/PPEM73/clsdata" --old-bindir "/u01/app/postgres/product/9.6.3.0/DERIV01/bin" --new-bindir "/u01/app/postgres/product/10.5.0.0/DERIV01/bin"
Postgres raise the following error :
Performing Consistency Checks
-----------------------------
Checking cluster versions ok
The source cluster was not shut down cleanly.
Failure, exiting
I tried to restart and shutdown clusters with another methods (-m options of pg_ctl, killing processes, …) still have the same issue.
Best regards,
Rémi DEGLAVE
Database Administrator
Lyreco
Phone: +33 (0)3 27 23 42 33
Rue du 19 Mars 1962 - 59770 MARLY - FRANCE
http://www.lyreco.com
supporting the educational development of children throughout the world.
Please, click here to discover Lyreco Actions for children.
Attachment
On Fri, Aug 10, 2018 at 12:51:29PM +0000, DEGLAVE Remi wrote: > The source cluster was not shut down cleanly. > > Failure, exiting > > > > I tried to restart and shutdown clusters with another methods (-m options of > pg_ctl, killing processes, …) still have the same issue. There is new code in PG 10.5 thta detects that the server is cleanly shut down. You can no longer use '-m immediate' to shut down either server, but 'smart' and 'fast' should be fine. Can you run pg_controldata on each cluster before you run pg_upgrade to verify that they say "Shutdown": $ pg_controldata /u/pg/data ... Database cluster state: shut down --------- -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Fri, Aug 10, 2018 at 12:12:45PM -0400, Bruce Momjian wrote: > There is new code in PG 10.5 thta detects that the server is cleanly > shut down. You can no longer use '-m immediate' to shut down either > server, but 'smart' and 'fast' should be fine. Can you run > pg_controldata on each cluster before you run pg_upgrade to verify that > they say "Shutdown": You are talking about 244142d, right? I see this code bit: + if (strcmp(p, "shut down\n") != 0) + { + if (cluster == &old_cluster) + pg_fatal("The source cluster was not shut down cleanly.\n"); + else + pg_fatal("The target cluster was not shut down cleanly.\n"); + } This seems incorrect for me in the case of standbys, as pg_controldata reports in this case "shut down in recovery", and one can run pg_upgrade on a standby as well, no? -- Michael
Attachment
On Fri, Aug 10, 2018 at 06:42:40PM +0200, Michael Paquier wrote: > On Fri, Aug 10, 2018 at 12:12:45PM -0400, Bruce Momjian wrote: > > There is new code in PG 10.5 thta detects that the server is cleanly > > shut down. You can no longer use '-m immediate' to shut down either > > server, but 'smart' and 'fast' should be fine. Can you run > > pg_controldata on each cluster before you run pg_upgrade to verify that > > they say "Shutdown": > > You are talking about 244142d, right? I see this code bit: > + if (strcmp(p, "shut down\n") != 0) > + { > + if (cluster == &old_cluster) > + pg_fatal("The source cluster was not shut down cleanly.\n"); > + else > + pg_fatal("The target cluster was not shut down cleanly.\n"); > + } > > This seems incorrect for me in the case of standbys, as pg_controldata > reports in this case "shut down in recovery", and one can run pg_upgrade > on a standby as well, no? Oh, good point. I had not tested that. I can develop a patch to handle this. Was that the case in this upgrade report? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Fri, Aug 10, 2018 at 12:53:47PM -0400, Bruce Momjian wrote: > Oh, good point. I had not tested that. I can develop a patch to handle > this. Was that the case in this upgrade report? I cannot say if this is the issue reported here for sure, but the OP has mentioned "clusters", which may point to the fact that he is trying to upgrade standby(s) as well. You need visibly a one-liner like that: --- a/src/bin/pg_upgrade/controldata.c +++ b/src/bin/pg_upgrade/controldata.c @@ -150,7 +150,8 @@ get_control_data(ClusterInfo *cluster, bool live_check) /* remove leading spaces */ while (*p == ' ') p++; - if (strcmp(p, "shut down\n") != 0) + if (strcmp(p, "shut down\n") != 0 && + strcmp(p, "shut down in recovery\n") != 0) { if (cluster == &old_cluster) pg_fatal("The source cluster was not shut down cleanly.\n"); I have not tested it though. -- Michael
Attachment
On Fri, Aug 10, 2018 at 07:23:29PM +0200, Michael Paquier wrote: > On Fri, Aug 10, 2018 at 12:53:47PM -0400, Bruce Momjian wrote: > > Oh, good point. I had not tested that. I can develop a patch to handle > > this. Was that the case in this upgrade report? > > I cannot say if this is the issue reported here for sure, but the OP has > mentioned "clusters", which may point to the fact that he is trying to > upgrade standby(s) as well. > > You need visibly a one-liner like that: > --- a/src/bin/pg_upgrade/controldata.c > +++ b/src/bin/pg_upgrade/controldata.c > @@ -150,7 +150,8 @@ get_control_data(ClusterInfo *cluster, bool live_check) > /* remove leading spaces */ > while (*p == ' ') > p++; > - if (strcmp(p, "shut down\n") != 0) > + if (strcmp(p, "shut down\n") != 0 && > + strcmp(p, "shut down in recovery\n") != 0) > { > if (cluster == &old_cluster) > pg_fatal("The source cluster was not shut down cleanly.\n"); > > I have not tested it though. I was going to do: /* handle "shut down" and "shut down in recovery" (on standbys) strncmp(p, "shut down", strlen("shut down")) but maybe yours is clearer. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Fri, Aug 10, 2018 at 01:25:25PM -0400, Bruce Momjian wrote: > I was going to do: > > /* handle "shut down" and "shut down in recovery" (on standbys) > strncmp(p, "shut down", strlen("shut down")) > > but maybe yours is clearer. Any method is fine for me. I prefer mine as that's somewhat more consistent with what happens in pg_rewind & co. As that is your code, feel free of course to use what you think is most suited ;) -- Michael
Attachment
El 10/08/18 a las 13:12, Bruce Momjian escribió: > On Fri, Aug 10, 2018 at 12:51:29PM +0000, DEGLAVE Remi wrote: >> The source cluster was not shut down cleanly. >> >> Failure, exiting >> >> I tried to restart and shutdown clusters with another methods (-m options of >> pg_ctl, killing processes, …) still have the same issue. > > There is new code in PG 10.5 thta detects that the server is cleanly > shut down. You can no longer use '-m immediate' to shut down either > server, but 'smart' and 'fast' should be fine. Can you run > pg_controldata on each cluster before you run pg_upgrade to verify that > they say "Shutdown": He did mention trying to shutdown with different modes, and ended with the same result after pg_upgrade. I would recommend running pg_upgrade in verbose mode (add the -v option to the cmd), capture the output and open a thread in pgsql-general list (or send it back here for further review). -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Aug 10, 2018 at 05:41:14PM -0300, Martin Marques wrote: > El 10/08/18 a las 13:12, Bruce Momjian escribió: > > On Fri, Aug 10, 2018 at 12:51:29PM +0000, DEGLAVE Remi wrote: > >> The source cluster was not shut down cleanly. > >> > >> Failure, exiting > >> > >> I tried to restart and shutdown clusters with another methods (-m options of > >> pg_ctl, killing processes, …) still have the same issue. > > > > There is new code in PG 10.5 thta detects that the server is cleanly > > shut down. You can no longer use '-m immediate' to shut down either > > server, but 'smart' and 'fast' should be fine. Can you run > > pg_controldata on each cluster before you run pg_upgrade to verify that > > they say "Shutdown": > > He did mention trying to shutdown with different modes, and ended with > the same result after pg_upgrade. > > I would recommend running pg_upgrade in verbose mode (add the -v option > to the cmd), capture the output and open a thread in pgsql-general list > (or send it back here for further review). I have not seen any report from the original reporter so I have gone ahead and committed the fix suggested by Michael Paquier. This means that standby upgrades will fail in 10.5 until 10.6 is released. Ugh! I guess users can upgrade to 10.4 and then do a minor upgrade to 10.5 as a workaround. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Tue, Aug 14, 2018 at 05:23:35PM -0400, Bruce Momjian wrote: > I have not seen any report from the original reporter so I have gone > ahead and committed the fix suggested by Michael Paquier. Thanks Bruce. > This means that standby upgrades will fail in 10.5 until 10.6 is > released. Ugh! I guess users can upgrade to 10.4 and then do a minor > upgrade to 10.5 as a workaround. Yeah, that's bothersome depending on the distribution package though, hopefully folks would be able to use past package archives in this case as well. -- Michael
Attachment
On Thu, Aug 16, 2018 at 02:18:07AM +0200, Michael Paquier wrote: > On Tue, Aug 14, 2018 at 05:23:35PM -0400, Bruce Momjian wrote: > > I have not seen any report from the original reporter so I have gone > > ahead and committed the fix suggested by Michael Paquier. > > Thanks Bruce. > > > This means that standby upgrades will fail in 10.5 until 10.6 is > > released. Ugh! I guess users can upgrade to 10.4 and then do a minor > > upgrade to 10.5 as a workaround. > > Yeah, that's bothersome depending on the distribution package though, > hopefully folks would be able to use past package archives in this case > as well. Agreed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Hi, Sorry for the response delay. Thanks for your feedbacks, I'll try a 10.4 upgrade and then a minor upgrade to 10.5. Thanks and regards, Rémi DEGLAVE Database Administrator Lyreco Phone: +33 (0)3 27 23 42 33 Rue du 19 Mars 1962 - 59770 MARLY - FRANCE http://www.lyreco.com supporting the educational development of children throughout the world. Please, click here to discover Lyreco Actions for children. -----Message d'origine----- De : Bruce Momjian <bruce@momjian.us> Envoyé : mardi 14 août 2018 23:24 À : Martin Marques <martin.marques@2ndquadrant.com> Cc : DEGLAVE Remi <remi.deglave@lyreco.com>; pgsql-bugs@postgresql.org; Michael Paquier <michael.paquier@gmail.com> Objet : Re: [PG_UPGRADE] 9.6 to 10.5 On Fri, Aug 10, 2018 at 05:41:14PM -0300, Martin Marques wrote: > El 10/08/18 a las 13:12, Bruce Momjian escribió: > > On Fri, Aug 10, 2018 at 12:51:29PM +0000, DEGLAVE Remi wrote: > >> The source cluster was not shut down cleanly. > >> > >> Failure, exiting > >> > >> I tried to restart and shutdown clusters with another methods (-m > >> options of pg_ctl, killing processes, …) still have the same issue. > > > > There is new code in PG 10.5 thta detects that the server is cleanly > > shut down. You can no longer use '-m immediate' to shut down either > > server, but 'smart' and 'fast' should be fine. Can you run > > pg_controldata on each cluster before you run pg_upgrade to verify > > that they say "Shutdown": > > He did mention trying to shutdown with different modes, and ended with > the same result after pg_upgrade. > > I would recommend running pg_upgrade in verbose mode (add the -v > option to the cmd), capture the output and open a thread in > pgsql-general list (or send it back here for further review). I have not seen any report from the original reporter so I have gone ahead and committed the fix suggested by Michael Paquier. This means that standby upgrades will fail in 10.5 until 10.6 is released. Ugh! I guess users can upgrade to 10.4 and thendo a minor upgrade to 10.5 as a workaround. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
2018-08-14 18:23 GMT-03:00 Bruce Momjian <bruce@momjian.us>: > > I have not seen any report from the original reporter so I have gone > ahead and committed the fix suggested by Michael Paquier. > > This means that standby upgrades will fail in 10.5 until 10.6 is > released. Ugh! I guess users can upgrade to 10.4 and then do a minor > upgrade to 10.5 as a workaround. I didn't want to jump in earlier as I'm not a pg_upgrade expert, and to be honest, haven't done any code reading around this tool, but I was puzzled with Michael's comment on pg_upgrade failing on a standby node. I have always been under the impression that pg_upgrade had to be executed on a primary server, and standbys had to be recloned or rsynced (which is just the old way of recloning). Maybe there's something I wasn't aware of. But if that's the case, then the docs don't reflect this missing knowledge and we should amend it. Regards, -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Aug 16, 2018 at 07:05:53AM +0000, DEGLAVE Remi wrote: > Hi, > > Sorry for the response delay. > Thanks for your feedbacks, I'll try a 10.4 upgrade and then a minor upgrade to 10.5. Thanks, sorry for the inconvenience. --------------------------------------------------------------------------- > > Thanks and regards, > > Rémi DEGLAVE > Database Administrator > Lyreco > Phone: +33 (0)3 27 23 42 33 > Rue du 19 Mars 1962 - 59770 MARLY - FRANCE > http://www.lyreco.com > > supporting the educational development of children throughout the world. > Please, click here to discover Lyreco Actions for children. > > > -----Message d'origine----- > De : Bruce Momjian <bruce@momjian.us> > Envoyé : mardi 14 août 2018 23:24 > À : Martin Marques <martin.marques@2ndquadrant.com> > Cc : DEGLAVE Remi <remi.deglave@lyreco.com>; pgsql-bugs@postgresql.org; Michael Paquier <michael.paquier@gmail.com> > Objet : Re: [PG_UPGRADE] 9.6 to 10.5 > > On Fri, Aug 10, 2018 at 05:41:14PM -0300, Martin Marques wrote: > > El 10/08/18 a las 13:12, Bruce Momjian escribió: > > > On Fri, Aug 10, 2018 at 12:51:29PM +0000, DEGLAVE Remi wrote: > > >> The source cluster was not shut down cleanly. > > >> > > >> Failure, exiting > > >> > > >> I tried to restart and shutdown clusters with another methods (-m > > >> options of pg_ctl, killing processes, …) still have the same issue. > > > > > > There is new code in PG 10.5 thta detects that the server is cleanly > > > shut down. You can no longer use '-m immediate' to shut down either > > > server, but 'smart' and 'fast' should be fine. Can you run > > > pg_controldata on each cluster before you run pg_upgrade to verify > > > that they say "Shutdown": > > > > He did mention trying to shutdown with different modes, and ended with > > the same result after pg_upgrade. > > > > I would recommend running pg_upgrade in verbose mode (add the -v > > option to the cmd), capture the output and open a thread in > > pgsql-general list (or send it back here for further review). > > I have not seen any report from the original reporter so I have gone ahead and committed the fix suggested by Michael Paquier. > > This means that standby upgrades will fail in 10.5 until 10.6 is released. Ugh! I guess users can upgrade to 10.4 andthen do a minor upgrade to 10.5 as a workaround. > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + As you are, so once was I. As I am, so you will be. + > + Ancient Roman grave inscription + -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Thu, Aug 16, 2018 at 08:15:43AM -0300, Martín Marqués wrote: > 2018-08-14 18:23 GMT-03:00 Bruce Momjian <bruce@momjian.us>: > > > > I have not seen any report from the original reporter so I have gone > > ahead and committed the fix suggested by Michael Paquier. > > > > This means that standby upgrades will fail in 10.5 until 10.6 is > > released. Ugh! I guess users can upgrade to 10.4 and then do a minor > > upgrade to 10.5 as a workaround. > > I didn't want to jump in earlier as I'm not a pg_upgrade expert, and > to be honest, haven't done any code reading around this tool, but I > was puzzled with Michael's comment on pg_upgrade failing on a standby > node. I have always been under the impression that pg_upgrade had to > be executed on a primary server, and standbys had to be recloned or > rsynced (which is just the old way of recloning). > > Maybe there's something I wasn't aware of. But if that's the case, > then the docs don't reflect this missing knowledge and we should amend > it. Uhgh, what is the matter with me! I specifically wrote down the instructions on upgrading a standby in step 10, and even added details so I could remember how rsync worked, and then forgot all about it when dealing with this bug report: https://www.postgresql.org/docs/11/static/pgupgrade.html What this does is to record the links created by pg_upgrade's link mode that connect files in the old and new clusters on the primary server. It then finds matching files in the standby's old cluster and creates links for them in the standby's new cluster. Files that were not linked on the primary are copied from the primary to the standby. (They are usually small.) This provides rapid standby upgrades. Unfortunately, rsync needlessly copies files associated with temporary and unlogged tables because these files don't normally exist on standby servers. My apologies. I am getting old! Anyway, let me outline what each PG version does. For primary server upgrades, 10.4 (and other minor releases) do not detect if you did a pg_ctl -m immediate for the shut down, and does the upgrade, but the upgrade could be corrupt. 10.5 fixes that by reporting improper shut down. For standby server upgrades, 10.4 allows a standby that is shut down to be upgraded, even though after the upgrade it cannot be reconnected as a standby. For 10.5, because of the missing shutdown message test, it will report the server was improperly shut down. This is fixed in the git trees for 10.6. So, I think 10.5 and the current source trees are fine for primary upgrades. How do we want to handle upgrading a shut down standby? I think having a server go from standby to primary while pg_upgrade is running is just too dangerous. Specifically, if you shut down a standby, then remove recovery.conf, the first start of the server will make it become a primary, but at the time pg_upgrade is checking it, it still sees the standby shut down. I think we just need to add an error message to say the server must be shut down as a _primary_ server, not as a standby server, and we can recommend to read the documentation on the use of rsync. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
El 16/08/18 a las 13:36, Bruce Momjian escribió: > > I think we just need to add an error message to say the server must be > shut down as a _primary_ server, not as a standby server, and we can > recommend to read the documentation on the use of rsync. That's what I was think pg_upgrade should do. -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Aug 16, 2018 at 12:36:20PM -0400, Bruce Momjian wrote: > I think we just need to add an error message to say the server must be > shut down as a _primary_ server, not as a standby server, and we can > recommend to read the documentation on the use of rsync. Attached is a proposed patch. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Attachment
2018-08-16 14:58 GMT-03:00 Bruce Momjian <bruce@momjian.us>: > On Thu, Aug 16, 2018 at 12:36:20PM -0400, Bruce Momjian wrote: >> I think we just need to add an error message to say the server must be >> shut down as a _primary_ server, not as a standby server, and we can >> recommend to read the documentation on the use of rsync. > > Attached is a proposed patch. The logic in the patch is ok, but I'd use a different error message, one that just mentions that you can only upgrade a primary node: The source cluster was shut down while in recovery mode, but pg_upgrade expected a primary node. The error referring to the target node should be similar, although I can't think of how to phrase it without saying "how on earth did you get a standby for the target! RTFM!" :-D Regards, -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Aug 16, 2018 at 04:41:04PM -0300, Martín Marqués wrote: > 2018-08-16 14:58 GMT-03:00 Bruce Momjian <bruce@momjian.us>: > > On Thu, Aug 16, 2018 at 12:36:20PM -0400, Bruce Momjian wrote: > >> I think we just need to add an error message to say the server must be > >> shut down as a _primary_ server, not as a standby server, and we can > >> recommend to read the documentation on the use of rsync. > > > > Attached is a proposed patch. > > The logic in the patch is ok, but I'd use a different error message, > one that just mentions that you can only upgrade a primary node: > > The source cluster was shut down while in recovery mode, but > pg_upgrade expected a primary node. Well, if people are hitting the error because they are using pg_upgrade on standbys, I assume we should give them some instruction that you should not do that, no? > The error referring to the target node should be similar, although I > can't think of how to phrase it without saying "how on earth did you > get a standby for the target! RTFM!" :-D Yeah, for consistency I made it match. ;-) -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Thu, Aug 16, 2018 at 05:23:58PM -0400, Bruce Momjian wrote: > On Thu, Aug 16, 2018 at 04:41:04PM -0300, Martín Marqués wrote: > > 2018-08-16 14:58 GMT-03:00 Bruce Momjian <bruce@momjian.us>: > > > On Thu, Aug 16, 2018 at 12:36:20PM -0400, Bruce Momjian wrote: > > >> I think we just need to add an error message to say the server must be > > >> shut down as a _primary_ server, not as a standby server, and we can > > >> recommend to read the documentation on the use of rsync. > > > > > > Attached is a proposed patch. > > > > The logic in the patch is ok, but I'd use a different error message, > > one that just mentions that you can only upgrade a primary node: > > > > The source cluster was shut down while in recovery mode, but > > pg_upgrade expected a primary node. > > Well, if people are hitting the error because they are using pg_upgrade > on standbys, I assume we should give them some instruction that you > should not do that, no? > > > The error referring to the target node should be similar, although I > > can't think of how to phrase it without saying "how on earth did you > > get a standby for the target! RTFM!" :-D > > Yeah, for consistency I made it match. ;-) I have applied the patch, with a better error message that mentions the documentation: https://git.postgresql.org/pg/commitdiff/b94f7b5350e97ef0587c0c64aed6eb940d964c06 -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +