Thread: [PATCH] Disable bgworkers during servers start in pg_upgrade
Hello, We found an issue in pg_upgrade on a cluster with a third-party background worker. The upgrade goes fine, but the new cluster is then in an inconsistent state. The background worker comes from the PoWA extension but the issue does not appear to related to this particular code. Here is a shell script to reproduce the issue (error at the end): OLDBINDIR=/usr/lib/postgresql/11/bin NEWBINDIR=/usr/lib/postgresql/13/bin OLDDATADIR=$(mktemp -d) NEWDATADIR=$(mktemp -d) $OLDBINDIR/initdb -D $OLDDATADIR echo "unix_socket_directories = '/tmp'" >> $OLDDATADIR/postgresql.auto.conf echo "shared_preload_libraries = 'pg_stat_statements, powa'" >> $OLDDATADIR/postgresql.auto.conf $OLDBINDIR/pg_ctl -D $OLDDATADIR -l $OLDDATADIR/pgsql.log start $OLDBINDIR/createdb -h /tmp powa $OLDBINDIR/psql -h /tmp -d powa -c "CREATE EXTENSION powa CASCADE" $OLDBINDIR/pg_ctl -D $OLDDATADIR -m fast stop $NEWBINDIR/initdb -D $NEWDATADIR cp $OLDDATADIR/postgresql.auto.conf $NEWDATADIR/postgresql.auto.conf $NEWBINDIR/pg_upgrade --old-datadir $OLDDATADIR --new-datadir $NEWDATADIR --old-bindir $OLDBINDIR $NEWBINDIR/pg_ctl -D $NEWDATADIR -l $NEWDATADIR/pgsql.log start $NEWBINDIR/psql -h /tmp -d powa -c "SELECT 1 FROM powa_snapshot_metas" # ERROR: MultiXactId 1 has not been created yet -- apparent wraparound (This needs PoWA to be installed; packages are available on pgdg repositories as postgresql-<pgversion>-powa on Debian or powa_<pgversion> on RedHat or directly from source code at https://github.com/powa-team/powa-archivist). As far as I currently understand, this is due to the data to be migrated being somewhat inconsistent (from the perspective of pg_upgrade) when the old cluster and its background workers get started in pg_upgrade during the "checks" step. (The old cluster remains sane, still.) As a solution, it seems that, for similar reasons that we restrict socket access to prevent accidental connections (from commit f763b77193), we should also prevent background workers to start at this step. Please find attached a patch implementing this. Thanks for considering, Denis
Attachment
Hi, On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote: > We found an issue in pg_upgrade on a cluster with a third-party > background worker. The upgrade goes fine, but the new cluster is then in > an inconsistent state. The background worker comes from the PoWA > extension but the issue does not appear to related to this particular > code. Well, it does imply that that backgrounder did something, as the pure existence of a bgworker shouldn't affect anything. Presumably the issue is that the bgworker actually does transactional writes, which causes problems because the xids / multixacts from early during pg_upgrade won't actually be valid after we do pg_resetxlog etc. > As a solution, it seems that, for similar reasons that we restrict > socket access to prevent accidental connections (from commit > f763b77193), we should also prevent background workers to start at this > step. I think that'd have quite the potential for negative impact - imagine extensions that refuse to be loaded outside of shared_preload_libraries (e.g. because they need to allocate shared memory) but that are required during the course of pg_upgrade (e.g. because it's a tableam, a PL or such). Those libraries will then tried to be loaded during the upgrade (due to the _PG_init() hook being called when functions from the extension are needed, e.g. the tableam or PL handler). Nor is it clear to me that the only way this would be problematic is with shared_preload_libraries. A library in local_preload_libraries, or a demand loaded library can trigger bgworkers (or database writes in some other form) as well. I wonder if we could a) set default_transaction_read_only to true, and explicitly change it in the sessions that need that. b) when in binary upgrade mode / -b, error out on all wal writes in sessions that don't explicitly set a session-level GUC to allow writes. Greetings, Andres Freund
Hi, Andres Freund a écrit : > On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote: > > We found an issue in pg_upgrade on a cluster with a third-party > > background worker. The upgrade goes fine, but the new cluster is then in > > an inconsistent state. The background worker comes from the PoWA > > extension but the issue does not appear to related to this particular > > code. > > Well, it does imply that that backgrounder did something, as the pure > existence of a bgworker shouldn't affect > > anything. Presumably the issue is that the bgworker actually does > transactional writes, which causes problems because the xids / > multixacts from early during pg_upgrade won't actually be valid after we > do pg_resetxlog etc. > > > > As a solution, it seems that, for similar reasons that we restrict > > socket access to prevent accidental connections (from commit > > f763b77193), we should also prevent background workers to start at this > > step. > > I think that'd have quite the potential for negative impact - imagine > extensions that refuse to be loaded outside of shared_preload_libraries > (e.g. because they need to allocate shared memory) but that are required > during the course of pg_upgrade (e.g. because it's a tableam, a PL or > such). Those libraries will then tried to be loaded during the upgrade > (due to the _PG_init() hook being called when functions from the > extension are needed, e.g. the tableam or PL handler). > > Nor is it clear to me that the only way this would be problematic is > with shared_preload_libraries. A library in local_preload_libraries, or > a demand loaded library can trigger bgworkers (or database writes in > some other form) as well. Thank you for those insights! > I wonder if we could > > a) set default_transaction_read_only to true, and explicitly change it > in the sessions that need that. > b) when in binary upgrade mode / -b, error out on all wal writes in > sessions that don't explicitly set a session-level GUC to allow > writes. Solution "a" appears to be enough to solve the problem described in my initial email. See attached patch implementing this. Cheers, Denis
Attachment
Re: [PATCH] Disable bgworkers during servers start in pg_upgrade
From
Jehan-Guillaume de Rorthais
Date:
Hi, On Wed, 27 Jan 2021 11:25:11 +0100 Denis Laxalde <denis.laxalde@dalibo.com> wrote: > Andres Freund a écrit : > > On 2021-01-21 16:23:58 +0100, Denis Laxalde wrote: > > > We found an issue in pg_upgrade on a cluster with a third-party > > > background worker. The upgrade goes fine, but the new cluster is then in > > > an inconsistent state. The background worker comes from the PoWA > > > extension but the issue does not appear to related to this particular > > > code. > > > > Well, it does imply that that backgrounder did something, as the pure > > existence of a bgworker shouldn't affect anything. Presumably the issue is > > that the bgworker actually does transactional writes, which causes problems > > because the xids / multixacts from early during pg_upgrade won't actually > > be valid after we do pg_resetxlog etc. Indeed, it does some writes. As soon as the powa bgworker starts, it takes "snapshots" of pg_stat_statements state and record them in a local table. To avoid concurrent run, it takes a lock on some of its local rows using SELECT FOR UPDATE, hence the mxid consumption. The inconsistency occurs at least at two place: * the datminmxid and relminmxid fields pg_dump(all)'ed and restored in the new cluster. * the multixid fields in the controlfile read during the check phase and restored later using pg_resetxlog. > > > As a solution, it seems that, for similar reasons that we restrict > > > socket access to prevent accidental connections (from commit > > > f763b77193), we should also prevent background workers to start at this > > > step. > > > > I think that'd have quite the potential for negative impact - [...] > > Thank you for those insights! +1 > > I wonder if we could > > > > a) set default_transaction_read_only to true, and explicitly change it > > in the sessions that need that. According to Denis' tests discussed off-list, it works fine in regard with the powa bgworker, albeit some complaints in logs. However, I wonder how fragile it could be as bgworker could easily manipulate either the GUC or "BEGIN READ WRITE". I realize this is really uncommon practices, but bgworker code from third parties might be surprising. > > b) when in binary upgrade mode / -b, error out on all wal writes in > > sessions that don't explicitly set a session-level GUC to allow > > writes. It feels safer because more specific to the subject. But I wonder if the benefice worth adding some (limited?) complexity and a small/quick conditional block in a very hot code path for a very limited use case. What about c) where the bgworker are not loaded by default during binary upgrade mode / -b unless they explicitly set a bgw_flags (BGWORKER_BINARY_UPGRADE ?) when they are required during pg_upgrade? Regards,
Re: [PATCH] Disable bgworkers during servers start in pg_upgrade
From
Jehan-Guillaume de Rorthais
Date:
Oh, I forgot another point before sending my previous email. Maybe it might worth adding some final safety checks in pg_upgrade itself? Eg. checking controldata and mxid files coherency between old and new cluster would have catch the inconsistency here.
Julien Rouhaud a écrit : > On Wed, Jan 27, 2021 at 02:41:32PM +0100, Jehan-Guillaume de Rorthais wrote: >> >> On Wed, 27 Jan 2021 11:25:11 +0100 >> Denis Laxalde <denis.laxalde@dalibo.com> wrote: >> >>> Andres Freund a écrit : >> >>>> I wonder if we could >>>> >>>> a) set default_transaction_read_only to true, and explicitly change it >>>> in the sessions that need that. >> >> According to Denis' tests discussed off-list, it works fine in regard with the >> powa bgworker, albeit some complaints in logs. However, I wonder how fragile it >> could be as bgworker could easily manipulate either the GUC or "BEGIN READ >> WRITE". I realize this is really uncommon practices, but bgworker code from >> third parties might be surprising. > > Given that having any writes happening at the wrong moment on the old cluster > can end up corrupting the new cluster, and that the corruption might not be > immediately visible we should try to put as many safeguards as possible. > > so +1 for the default_transaction_read_only as done in Denis' patch at minimum, > but not only. > > AFAICT it should be easy to prevent all background worker from being launched > by adding a check on IsBinaryUpgrade at the beginning of > bgworker_should_start_now(). It won't prevent modules from being loaded, so > this approach should be problematic. Please find attached another patch implementing this suggestion (as a complement to the previous patch setting default_transaction_read_only). >>>> b) when in binary upgrade mode / -b, error out on all wal writes in >>>> sessions that don't explicitly set a session-level GUC to allow >>>> writes. >> >> It feels safer because more specific to the subject. But I wonder if the >> benefice worth adding some (limited?) complexity and a small/quick conditional >> block in a very hot code path for a very limited use case. > > I don't think that it would add that much complexity or overhead as there's > already all the infrastructure to prevent WAL writes in certain condition. It > should be enough to add an additional test in XLogInsertAllowed() with some new > variable that is set when starting in binary upgrade mode, and a new function > to disable it that will be emitted by pg_dump / pg_dumpall in binary upgrade > mode. This part is less clear to me so I'm not sure I'd be able to work on it.
Attachment
> On 24 Aug 2021, at 16:40, Denis Laxalde <denis.laxalde@dalibo.com> wrote: > Please find attached another patch implementing this suggestion (as a complement to the previous patch setting default_transaction_read_only). Please add this to the upcoming commitfest to make sure it's not missed: https://commitfest.postgresql.org/34/ -- Daniel Gustafsson https://vmware.com/
Michael Paquier a écrit : >> @@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw) >> static bool >> bgworker_should_start_now(BgWorkerStartTime start_time) >> { >> + if (IsBinaryUpgrade) >> + return false; >> + > Using -c max_worker_processes=0 would just have the same effect, no? > So we could just patch pg_upgrade's server.c to get the same level of > protection? Yes, same effect indeed. This would log "too many background workers" messages in pg_upgrade logs, though. See attached patch implementing this suggestion.
Attachment
On Thu, Aug 26, 2021 at 3:15 PM Denis Laxalde <denis.laxalde@dalibo.com> wrote: > > Michael Paquier a écrit : > >> @@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw) > >> static bool > >> bgworker_should_start_now(BgWorkerStartTime start_time) > >> { > >> + if (IsBinaryUpgrade) > >> + return false; > >> + > > Using -c max_worker_processes=0 would just have the same effect, no? > > So we could just patch pg_upgrade's server.c to get the same level of > > protection? > > Yes, same effect indeed. This would log "too many background workers" > messages in pg_upgrade logs, though. > See attached patch implementing this suggestion. I disagree. It can appear to have the same effect but it's not guaranteed. Any module in shared_preload_libraries could stick a "max_worker_processes +=X" if it thinks it should account for its own ressources. That may not be something encouraged, but it's definitely possible (and I think Andres recently mentioned that some extensions do things like that, although maybe for other GUCs) and could result in a corruption of a pg_upgrade'd cluster, so I still think that changing bgworker_should_start_now() is a better option.
On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote: > On Thu, Aug 26, 2021 at 3:15 PM Denis Laxalde <denis.laxalde@dalibo.com> wrote: > > > > Michael Paquier a écrit : > > >> @@ -5862,6 +5862,9 @@ do_start_bgworker(RegisteredBgWorker *rw) > > >> static bool > > >> bgworker_should_start_now(BgWorkerStartTime start_time) > > >> { > > >> + if (IsBinaryUpgrade) > > >> + return false; > > >> + > > > Using -c max_worker_processes=0 would just have the same effect, no? > > > So we could just patch pg_upgrade's server.c to get the same level of > > > protection? > > > > Yes, same effect indeed. This would log "too many background workers" > > messages in pg_upgrade logs, though. > > See attached patch implementing this suggestion. > > I disagree. It can appear to have the same effect but it's not > guaranteed. Any module in shared_preload_libraries could stick a > "max_worker_processes +=X" if it thinks it should account for its own > ressources. That may not be something encouraged, but it's definitely > possible (and I think Andres recently mentioned that some extensions > do things like that, although maybe for other GUCs) and could result > in a corruption of a pg_upgrade'd cluster, so I still think that > changing bgworker_should_start_now() is a better option. I am not sure. We have a lot of pg_upgrade code that turns off things like autovacuum and that has worked fine: snprintf(cmd, sizeof(cmd), "\"%s/pg_ctl\" -w -l \"%s\" -D \"%s\" -o \"-p %d%s%s %s%s\" start", cluster->bindir, SERVER_LOG_FILE, cluster->pgconfig, cluster->port, (cluster->controldata.cat_ver >= BINARY_UPGRADE_SERVER_FLAG_CAT_VER) ? " -b" : " -c autovacuum=off -c autovacuum_freeze_max_age=2000000000", (cluster == &new_cluster) ? " -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c vacuum_defer_cleanup_age=0" : "", cluster->pgopts ? cluster->pgopts : "", socket_string); Basically, pg_upgrade has avoided any backend changes that could be controlled by GUCs and I am not sure we want to start adding such changes for just this. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
> On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote: >> .. I still think that >> changing bgworker_should_start_now() is a better option. > > I am not sure. We have a lot of pg_upgrade code that turns off things > like autovacuum and that has worked fine: True, but there are also conditionals on IsBinaryUpgrade for starting the autovacuum launcher in the postmaster, so there is some precedent. > Basically, pg_upgrade has avoided any backend changes that could be > controlled by GUCs and I am not sure we want to start adding such > changes for just this. In principle I think it’s sound to try to avoid backend changes where possible without sacrificing robustness. -- Daniel Gustafsson https://vmware.com/
On Thu, Aug 26, 2021 at 03:38:23PM +0200, Daniel Gustafsson wrote: > > On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote: > > >> .. I still think that > >> changing bgworker_should_start_now() is a better option. > > > > I am not sure. We have a lot of pg_upgrade code that turns off things > > like autovacuum and that has worked fine: > > True, but there are also conditionals on IsBinaryUpgrade for starting the > autovacuum launcher in the postmaster, so there is some precedent. Oh, I was not aware of that. > > Basically, pg_upgrade has avoided any backend changes that could be > > controlled by GUCs and I am not sure we want to start adding such > > changes for just this. > > In principle I think it’s sound to try to avoid backend changes where possible > without sacrificing robustness. Agreed. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson <daniel@yesql.se> a écrit :
> On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us> wrote:
> Basically, pg_upgrade has avoided any backend changes that could be
> controlled by GUCs and I am not sure we want to start adding such
> changes for just this.
In principle I think it’s sound to try to avoid backend changes where possible
without sacrificing robustness.
I agree, but it seems quite more likely that an extension relying on a bgworker changes this guc, compared to an extension forcing autovacuum to be on for instance.
> On 26 Aug 2021, at 15:43, Julien Rouhaud <rjuju123@gmail.com> wrote: > > Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> a écrit : > > On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us <mailto:bruce@momjian.us>> wrote: > > > Basically, pg_upgrade has avoided any backend changes that could be > > controlled by GUCs and I am not sure we want to start adding such > > changes for just this. > > In principle I think it’s sound to try to avoid backend changes where possible > without sacrificing robustness. > > I agree, but it seems quite more likely that an extension relying on a bgworker changes this guc, compared to an extensionforcing autovacuum to be on for instance. Agreed, in this particular case I think there is merit to the idea of enforcing it in the backend. -- Daniel Gustafsson https://vmware.com/
Bruce Momjian a écrit : > On Thu, Aug 26, 2021 at 03:38:23PM +0200, Daniel Gustafsson wrote: >>> On 26 Aug 2021, at 15:09, Bruce Momjian<bruce@momjian.us> wrote: >>> On Thu, Aug 26, 2021 at 03:24:33PM +0800, Julien Rouhaud wrote: >>>> .. I still think that >>>> changing bgworker_should_start_now() is a better option. >>> I am not sure. We have a lot of pg_upgrade code that turns off things >>> like autovacuum and that has worked fine: >> True, but there are also conditionals on IsBinaryUpgrade for starting the >> autovacuum launcher in the postmaster, so there is some precedent. > Oh, I was not aware of that. > If I understand correctly, autovacuum is turned off by pg_upgrade code only if the old cluster does not support the -b flag (prior to 9.1 apparently). Otherwise, this is indeed handled by IsBinaryUpgrade.
On Thu, Aug 26, 2021 at 03:59:49PM +0200, Daniel Gustafsson wrote: > > On 26 Aug 2021, at 15:43, Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > Le jeu. 26 août 2021 à 21:38, Daniel Gustafsson <daniel@yesql.se <mailto:daniel@yesql.se>> a écrit : > > > On 26 Aug 2021, at 15:09, Bruce Momjian <bruce@momjian.us <mailto:bruce@momjian.us>> wrote: > > > > > Basically, pg_upgrade has avoided any backend changes that could be > > > controlled by GUCs and I am not sure we want to start adding such > > > changes for just this. > > > > In principle I think it’s sound to try to avoid backend changes where possible > > without sacrificing robustness. > > > > I agree, but it seems quite more likely that an extension relying on a bgworker changes this guc, compared to an extensionforcing autovacuum to be on for instance. > > Agreed, in this particular case I think there is merit to the idea of enforcing > it in the backend. OK, works for me -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, Aug 26, 2021 at 10:34:48AM -0400, Bruce Momjian wrote: > On Thu, Aug 26, 2021 at 03:59:49PM +0200, Daniel Gustafsson wrote: >> Agreed, in this particular case I think there is merit to the idea of enforcing >> it in the backend. > > OK, works for me Indeed, there is some history here with autovacuum. I have not been careful enough to check that. Still, putting a check on IsBinaryUpgrade in bgworker_should_start_now() would mean that we still keep track of the set of bgworkers registered in shared memory. Wouldn't it be better to block things at the source, as of RegisterBackgroundWorker()? And that would keep track of the control we have on bgworkers in a single place. I also think that we'd better document something about that either in bgworker.sgml or pg_upgrade's page. -- Michael
Attachment
On Fri, Aug 27, 2021 at 7:31 AM Michael Paquier <michael@paquier.xyz> wrote: > > Indeed, there is some history here with autovacuum. I have not been > careful enough to check that. Still, putting a check on > IsBinaryUpgrade in bgworker_should_start_now() would mean that we > still keep track of the set of bgworkers registered in shared memory. That shouldn't lead to any problem right? > Wouldn't it be better to block things at the source, as of > RegisterBackgroundWorker()? And that would keep track of the control > we have on bgworkers in a single place. I also think that we'd better > document something about that either in bgworker.sgml or pg_upgrade's > page. I'm fine with that approach too.
On Fri, Aug 27, 2021 at 09:34:24AM +0800, Julien Rouhaud wrote: > That shouldn't lead to any problem right? Well, bgworker_should_start_now() does not exist for that, and RegisterBackgroundWorker() is the one doing the classification job, so it would be more consistent to keep everything under control in the same code path. -- Michael
Attachment
On Fri, Aug 27, 2021 at 10:02 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Aug 27, 2021 at 09:34:24AM +0800, Julien Rouhaud wrote: > > That shouldn't lead to any problem right? > > Well, bgworker_should_start_now() does not exist for that, and > RegisterBackgroundWorker() is the one doing the classification job, so > it would be more consistent to keep everything under control in the > same code path. I'm not sure it's that uncontroversial. The way I see RegisterBackgroundWorker() is that it's responsible for doing some sanity checks to see if the module didn't make any error and if ressources are available. Surely checking for IsBinaryUpgrade should not be the responsibility of third-party code, so the question is whether binary upgrade is seen as a resource and as such a reason to forbid bgworker registration, in opposition to forbid the launch itself. Right now AFAICT there's no official API to check if a call to RegisterBackgroundWorker() succeeded or not, but an extension could check by itself using BackgroundWorkerList in bgworker_internals.h, and error out or something if it didn't succeed, as a way to inform users that they didn't configure the instance properly (like maybe increasing max_worker_processes). Surely using a *_internals.h header is a clear sign that you expose yourself to problems, but adding an official way to check for bgworker registration doesn't seem unreasonable to me. Is that worth the risk to have pg_upgrade erroring out in this kind of scenario, or make the addition of a new API to check for registration status more difficult?
On Fri, Aug 27, 2021 at 11:25:19AM +0800, Julien Rouhaud wrote: > Right now AFAICT there's no official API to check if a call to > RegisterBackgroundWorker() succeeded or not, but an extension could > check by itself using BackgroundWorkerList in bgworker_internals.h, > and error out or something if it didn't succeed, as a way to inform > users that they didn't configure the instance properly (like maybe > increasing max_worker_processes). Surely using a *_internals.h header > is a clear sign that you expose yourself to problems, but adding an > official way to check for bgworker registration doesn't seem > unreasonable to me. Is that worth the risk to have pg_upgrade > erroring out in this kind of scenario, or make the addition of a new > API to check for registration status more difficult? Perhaps. That feels like a topic different than what's discussed here, though, because we don't really need to check if a bgworker has been launched or not. We just need to make sure that it never runs in the context of a binary upgrade, like autovacuum. -- Michael
Attachment
On Fri, Aug 27, 2021 at 12:41 PM Michael Paquier <michael@paquier.xyz> wrote: > > Perhaps. That feels like a topic different than what's discussed > here, though, because we don't really need to check if a bgworker has > been launched or not. We just need to make sure that it never runs in > the context of a binary upgrade, like autovacuum. I'm a bit confused. Did you mean checking if a bgworker has been *registered* or not? But my point was that preventing a bgworker registration as a way to avoid it from being launched may lead to some problem if an extensions decides that a failure in the registration is problematic enough to prevent the startup altogether for instance. I'm ok if we decide that it's *not* an acceptable behavior, but it should be clear that it's the case, and probably documented.
Hi, On 2021-08-27 09:34:24 +0800, Julien Rouhaud wrote: > On Fri, Aug 27, 2021 at 7:31 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > Indeed, there is some history here with autovacuum. I have not been > > careful enough to check that. Still, putting a check on > > IsBinaryUpgrade in bgworker_should_start_now() would mean that we > > still keep track of the set of bgworkers registered in shared memory. > > That shouldn't lead to any problem right? > > > Wouldn't it be better to block things at the source, as of > > RegisterBackgroundWorker()? And that would keep track of the control > > we have on bgworkers in a single place. I also think that we'd better > > document something about that either in bgworker.sgml or pg_upgrade's > > page. > > I'm fine with that approach too. Isn't that just going to end up with extension code erroring out and/or blocking waiting for a bgworker to start? Greetings, Andres Freund
On Fri, Aug 27, 2021 at 12:28:42PM -0700, Andres Freund wrote: > Isn't that just going to end up with extension code erroring out and/or > blocking waiting for a bgworker to start? Well, that's the point to block things during an upgrade. Do you have a list of requirements you'd like to see satisfied here? POWA would be fine with blocking the execution of bgworkers AFAIK (Julien feel free to correct me here if necessary). It could be possible that preventing extension code to execute this way could prevent hazards as well. The idea from upthread to prevent any writes and/or WAL activity is not really different as extensions may still generate an error because of pg_upgrade's safety measures we'd put in, no? -- Michael
Attachment
On Sat, Aug 28, 2021 at 3:28 AM Andres Freund <andres@anarazel.de> wrote: > > > On Fri, Aug 27, 2021 at 7:31 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > > Wouldn't it be better to block things at the source, as of > > > RegisterBackgroundWorker()? And that would keep track of the control > > > we have on bgworkers in a single place. I also think that we'd better > > > document something about that either in bgworker.sgml or pg_upgrade's > > > page. > > Isn't that just going to end up with extension code erroring out and/or > blocking waiting for a bgworker to start? But there's no API to wait for the start of a non-dynamic bgworker or check for it right? So I don't see how the extension code could wait or error out. As far as I know the only thing you can do is RegisterBackgroundWorker() in your _PG_init() code and hope that the server is correctly configured. The only thing that third-party code could I think is try to check if the bgworker could be successfully registered or not as I mentioned in [1]. Maybe extra paranoid code may add such check in all executor hook but the overhead would be so terrible that no one would use such an extension anyway. [1] https://www.postgresql.org/message-id/CAOBaU_ZtR88x3Si6XwprqGo8UZNLncAQrD_-wc67sC=acO3g=w@mail.gmail.com
On Sat, Aug 28, 2021 at 9:40 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Aug 27, 2021 at 12:28:42PM -0700, Andres Freund wrote: > > Isn't that just going to end up with extension code erroring out and/or > > blocking waiting for a bgworker to start? > > Well, that's the point to block things during an upgrade. Do you have > a list of requirements you'd like to see satisfied here? POWA would > be fine with blocking the execution of bgworkers AFAIK (Julien feel > free to correct me here if necessary). Yes, no problem at all, whether the bgworker isn't registered or never launched. The bgworker isn't even mandatory anymore since a few years, as we introduced an external daemon to collect metrics on a distant database.
On Sat, Aug 28, 2021 at 10:40:42AM +0900, Michael Paquier wrote: > On Fri, Aug 27, 2021 at 12:28:42PM -0700, Andres Freund wrote: > > Isn't that just going to end up with extension code erroring out and/or > > blocking waiting for a bgworker to start? > > Well, that's the point to block things during an upgrade. Do you have > a list of requirements you'd like to see satisfied here? POWA would > be fine with blocking the execution of bgworkers AFAIK (Julien feel > free to correct me here if necessary). It could be possible that > preventing extension code to execute this way could prevent hazards as > well. The idea from upthread to prevent any writes and/or WAL > activity is not really different as extensions may still generate an > error because of pg_upgrade's safety measures we'd put in, no? This thread is now almost one year old, and AFAICT there's still no consensus on how to fix this problem. It would be good to have something done in pg15, ideally backpatched, as this is a corruption hazard that triggered at least once already. Andres, do you still have an objection with either preventing bgworker registration/launch or WAL-write during the impacted pg_upgrade steps, or a better alternative to fix the problem?
Hi, On 2022-01-12 17:54:31 +0800, Julien Rouhaud wrote: > On Sat, Aug 28, 2021 at 10:40:42AM +0900, Michael Paquier wrote: > > On Fri, Aug 27, 2021 at 12:28:42PM -0700, Andres Freund wrote: > > > Isn't that just going to end up with extension code erroring out and/or > > > blocking waiting for a bgworker to start? > > > > Well, that's the point to block things during an upgrade. Do you have > > a list of requirements you'd like to see satisfied here? POWA would > > be fine with blocking the execution of bgworkers AFAIK (Julien feel > > free to correct me here if necessary). It could be possible that > > preventing extension code to execute this way could prevent hazards as > > well. The idea from upthread to prevent any writes and/or WAL > > activity is not really different as extensions may still generate an > > error because of pg_upgrade's safety measures we'd put in, no? The point is that we need the check for WAL writes / xid assignments / etc *either* way. There are ways extensions could trigger problems like e.g. xid assigned, besides bgworker doing stuff. Or postgres components doing so unintentionally. Erroring out in situation where we *know* that there were concurrent changes unacceptable during pg_upgrade is always the right call. Such errors can be debugged and then addressed (removing the extension from s_p_l, fixing the extension, etc). In contrast to that, preventing upgrades from succeeding because an extension has a dependency on bgworkers working, just because the bgworker *could* be doing something bad is different. The bgworker might never write, have a check for binary upgrade mode, etc. It may not be realistic to fix and extension to work without the bgworkers. Imagine something like an table access method that has IO workers or such. > Andres, do you still have an objection with either preventing bgworker > registration/launch or WAL-write during the impacted pg_upgrade steps, or a > better alternative to fix the problem? I still object to the approach of preventing bgworker registration. It doesn't provide much protection and might cause hard to address problems for some extensions. I don't think I ever objected to preventing WAL-writes, I even proposed that upthread? Unless you suggest doing it specifically in bgworkers, rather than preventing similar problems outside bgworkers as well. Greetings, Andres Freund
Hi, On Thu, Jan 13, 2022 at 06:44:12PM -0800, Andres Freund wrote: > > The point is that we need the check for WAL writes / xid assignments / etc > *either* way. There are ways extensions could trigger problems like e.g. xid > assigned, besides bgworker doing stuff. Or postgres components doing so > unintentionally. > > Erroring out in situation where we *know* that there were concurrent changes > unacceptable during pg_upgrade is always the right call. Such errors can be > debugged and then addressed (removing the extension from s_p_l, fixing the > extension, etc). > > In contrast to that, preventing upgrades from succeeding because an extension > has a dependency on bgworkers working, just because the bgworker *could* be > doing something bad is different. The bgworker might never write, have a check > for binary upgrade mode, etc. It may not be realistic to fix and extension to > work without the bgworkers. > > Imagine something like an table access method that has IO workers or such. IIUC if a table access method has IO workers that starts doing writes quickly (or any similar extension that *is* required to be present during upgrade but that should be partially disabled), the only way to do a pg_upgrade would be to make sure that the extension explicitly checks for the binary-upgrade mode and don't do any writes, or provide a GUC for the same, since it should still preloaded? I'm fine with that, but that should probably be documented. > > > > Andres, do you still have an objection with either preventing bgworker > > registration/launch or WAL-write during the impacted pg_upgrade steps, or a > > better alternative to fix the problem? > > I still object to the approach of preventing bgworker registration. It doesn't > provide much protection and might cause hard to address problems for some > extensions. > > I don't think I ever objected to preventing WAL-writes, I even proposed that > upthread? Unless you suggest doing it specifically in bgworkers, rather than > preventing similar problems outside bgworkers as well. Sorry I missed that when re-reading the thread. And no I'm not suggesting preventing WAL writes in bgworkers only. Since there's a clear consensus on how to fix the problem, I'm switching the patch as Waiting on Author.
Hi, Julien Rouhaud a écrit : >> On Wed, 27 Jan 2021 11:25:11 +0100 >> Denis Laxalde <denis.laxalde@dalibo.com> wrote: >> >>> Andres Freund a écrit : >>>> b) when in binary upgrade mode / -b, error out on all wal writes in >>>> sessions that don't explicitly set a session-level GUC to allow >>>> writes. > It should be enough to add an additional test in XLogInsertAllowed() with some new > variable that is set when starting in binary upgrade mode, and a new function > to disable it that will be emitted by pg_dump / pg_dumpall in binary upgrade > mode. I tried that simple change first: diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index dfe2a0bcce..8feab0cb96 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -8498,6 +8498,9 @@ HotStandbyActiveInReplay(void) bool XLogInsertAllowed(void) { + if (IsBinaryUpgrade) + return false; + /* * If value is "unconditionally true" or "unconditionally false", just * return it. This provides the normal fast path once recovery is known But then, pg_upgrade's tests (make -C src/bin/pg_upgrade/ check) fail at vaccumdb but not during pg_dumpall: $ cat src/bin/pg_upgrade/pg_upgrade_utility.log ----------------------------------------------------------------- pg_upgrade run on Fri Jan 28 10:37:36 2022 ----------------------------------------------------------------- command: "/home/denis/src/pgsql/build/tmp_install/home/denis/.local/pgsql/bin/pg_dumpall" --host /home/denis/src/pgsql/build/src/bin/pg_upgrade --port 51696 --username denis --globals-only --quote-all-identifiers --binary-upgrade -f pg_upgrade_dump_globals.sql >> "pg_upgrade_utility.log" 2>&1 command: "/home/denis/src/pgsql/build/tmp_install/home/denis/.local/pgsql/bin/vacuumdb" --host /home/denis/src/pgsql/build/src/bin/pg_upgrade --port 51696 --username denis --all --analyze >> "pg_upgrade_utility.log" 2>&1 vacuumdb: vacuuming database "postgres" vacuumdb: error: processing of database "postgres" failed: PANIC: cannot make new WAL entries during recovery In contrast with pg_dump/pg_dumpall, vacuumdb has no --binary-upgrade flag, so it does not seem possible to use a special GUC setting to allow WAL writes in that vacuumdb session at the moment. Should we add --binary-upgrade to vacuumdb as well? Or am I going in the wrong direction? Thanks, Denis
Hi, On Fri, Jan 28, 2022 at 11:02:46AM +0100, Denis Laxalde wrote: > > I tried that simple change first: > > diff --git a/src/backend/access/transam/xlog.c > b/src/backend/access/transam/xlog.c > index dfe2a0bcce..8feab0cb96 100644 > --- a/src/backend/access/transam/xlog.c > +++ b/src/backend/access/transam/xlog.c > @@ -8498,6 +8498,9 @@ HotStandbyActiveInReplay(void) > bool > XLogInsertAllowed(void) > { > + if (IsBinaryUpgrade) > + return false; > + > > > But then, pg_upgrade's tests (make -C src/bin/pg_upgrade/ check) fail at > vaccumdb but not during pg_dumpall: > > [...] > > command: "/home/denis/src/pgsql/build/tmp_install/home/denis/.local/pgsql/bin/vacuumdb" > --host /home/denis/src/pgsql/build/src/bin/pg_upgrade --port 51696 > --username denis --all --analyze >> "pg_upgrade_utility.log" 2>&1 > vacuumdb: vacuuming database "postgres" > vacuumdb: error: processing of database "postgres" failed: PANIC: cannot > make new WAL entries during recovery > > In contrast with pg_dump/pg_dumpall, vacuumdb has no --binary-upgrade flag, > so it does not seem possible to use a special GUC setting to allow WAL > writes in that vacuumdb session at the moment. > Should we add --binary-upgrade to vacuumdb as well? Or am I going in the > wrong direction? I think having a new option for vacuumdb is the right move. It seems unlikely that any cron or similar on the host will try to run some concurrent vacuumdb, but we still have to enforce that only the one executed by pg_upgrade can succeed. I guess it could be an undocumented option, similar to postgres' -b, which would only be allowed iff --all and --freeze is also passed to be extra safe. While at it: > vacuumdb: error: processing of database "postgres" failed: PANIC: cannot > make new WAL entries during recovery Should we tweak that message when IsBinaryUpgrade is true?
Julien Rouhaud a écrit : > I think having a new option for vacuumdb is the right move. > > It seems unlikely that any cron or similar on the host will try to run some > concurrent vacuumdb, but we still have to enforce that only the one executed by > pg_upgrade can succeed. > > I guess it could be an undocumented option, similar to postgres' -b, which > would only be allowed iff --all and --freeze is also passed to be extra safe. The help text in pg_dump's man page states: --binary-upgrade This option is for use by in-place upgrade utilities. Its use for other purposes is not recommended or supported. The behavior of the option may change in future releases without notice. Is it enough? Or do we actually want to hide it for vacuumdb? > While at it: > >> vacuumdb: error: processing of database "postgres" failed: PANIC: cannot >> make new WAL entries during recovery > > Should we tweak that message when IsBinaryUpgrade is true? Yes, indeed, I had in mind to simply make the message more generic as: "cannot insert new WAL entries".
On Fri, Jan 28, 2022 at 03:06:57PM +0100, Denis Laxalde wrote: > Julien Rouhaud a écrit : > > I think having a new option for vacuumdb is the right move. > > > > It seems unlikely that any cron or similar on the host will try to run some > > concurrent vacuumdb, but we still have to enforce that only the one executed by > > pg_upgrade can succeed. > > > > I guess it could be an undocumented option, similar to postgres' -b, which > > would only be allowed iff --all and --freeze is also passed to be extra safe. > > The help text in pg_dump's man page states: > > --binary-upgrade > This option is for use by in-place upgrade > utilities. Its use for other purposes is not > recommended or supported. The behavior of > the option may change in future releases > without notice. > > Is it enough? Or do we actually want to hide it for vacuumdb? I think it should be hidden, with a comment about it like postmaster.c getopt call: case 'b': /* Undocumented flag used for binary upgrades */ > > > vacuumdb: error: processing of database "postgres" failed: PANIC: cannot > > > make new WAL entries during recovery > > > > Should we tweak that message when IsBinaryUpgrade is true? > > Yes, indeed, I had in mind to simply make the message more generic as: > "cannot insert new WAL entries". -1, it's good to have a clear reason why the error happened, especially since it's supposed to be "should not happen" situation.
On Fri, Jan 28, 2022 at 9:51 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > On Fri, Jan 28, 2022 at 10:20:07AM -0800, Andres Freund wrote: > > On 2022-01-28 21:56:36 +0800, Julien Rouhaud wrote: > > > I think having a new option for vacuumdb is the right move. > > > > Can't we pass the option via the connection string, e.g. something > > PGOPTIONS='-c binary_upgrade_mode=true'? That seems to scale better than to > > add it gradually to multiple tools. > > Ah right that's a better idea. OK, so I think the conclusion here is that no patch which does $SUBJECT is going to get committed, but somebody might write (or finish?) a patch that does something else which could possibly get committed once it's written. If and when that happens, I think that patch should be submitted on a new thread with a subject line that matches what the patch actually does. In the meantime, I'm going to mark the CF entry for *this* thread as Returned with Feedback. For what it's worth, I'm not 100% sure that $SUBJECT is a bad idea -- nor am I 100% sure that it's a good idea. On the other hand, I definitely think the alternative proposal of blocking WAL writes at times when they shouldn't be happening is a good idea, and most likely extensions should also be coded in a way where they're smart enough not to try except at times when it is safe. Therefore, it make sense to me to proceed along those kinds of lines for now, and if that's not enough and we need to revisit this idea at some point in the future, we can. Note that I'm taking no view for the present on whether any change that might end up being agreed here should go into v15 or not. It's in that fuzzy grey area where you could call it a feature, or a bug fix, or technically-a-feature-but-let's-slip-it-in-after-freeze-anyway. We can decide that when a completed patch shows up, though it's fair to point out that the longer that takes, the less likely it is to be v15 material. I am, however, taking the position that holding this CommitFest entry open is not in the best interest of the project. The patch we'd theoretically be committing doesn't exist yet and doesn't do what the subject line suggests. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com