Thread: Re: pg_upgrade check for invalid databases

Re: pg_upgrade check for invalid databases

From

Nathan Bossart

Date:

30 September 2024, 17:12:41

On Sun, Sep 29, 2024 at 08:45:50PM -0400, Thomas Krennwallner wrote:
> if a cluster contains invalid databases that we cannot connect to anymore,
> pg_upgrade would currently fail when trying to connect to the first
> encountered invalid database with
> 
> [...]
> 
> If there is more than one invalid database, we need to run pg_upgrade more
> than once (unless the user inspects pg_database).
> 
> I attached two small patches for PG 17 and PG 18 (can be easily backported
> to all previous versions upon request).  Instead of just failing to connect
> with an error, we collect all invalid databases in a report file
> invalid_databases.txt:

Should we have pg_upgrade skip invalid databases?  If the only valid action
is to drop them, IMHO it seems unnecessary to require users to manually
drop them before retrying pg_upgrade.

-- 
nathan

Re: pg_upgrade check for invalid databases

From

Tom Lane

Date:

30 September 2024, 17:55:39

Nathan Bossart <nathandbossart@gmail.com> writes:
> Should we have pg_upgrade skip invalid databases?  If the only valid action
> is to drop them, IMHO it seems unnecessary to require users to manually
> drop them before retrying pg_upgrade.

I was thinking the same.  But I wonder if there is any chance of
losing data that could be recoverable.  It feels like this should
not be a default behavior.

TBH I'm not finding anything very much wrong with the current
behavior... this has to be a rare situation, do we need to add
debatable behavior to make it easier?

            regards, tom lane

Re: pg_upgrade check for invalid databases

From

Daniel Gustafsson

Date:

01 October 2024, 00:29:35

> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> TBH I'm not finding anything very much wrong with the current
> behavior... this has to be a rare situation, do we need to add
> debatable behavior to make it easier?

One argument would be to make the checks consistent, pg_upgrade generally tries
to report all the offending entries to help the user when fixing the source
database.  Not sure if it's a strong enough argument for carrying code which
really shouldn't see much use though.

--
Daniel Gustafsson

Re: pg_upgrade check for invalid databases

From

Tom Lane

Date:

01 October 2024, 01:20:41

Daniel Gustafsson <daniel@yesql.se> writes:
>> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> TBH I'm not finding anything very much wrong with the current
>> behavior... this has to be a rare situation, do we need to add
>> debatable behavior to make it easier?

> One argument would be to make the checks consistent, pg_upgrade generally tries
> to report all the offending entries to help the user when fixing the source
> database.  Not sure if it's a strong enough argument for carrying code which
> really shouldn't see much use though.

OK, but the consistency argument would be to just report and fail.
I don't think there's a precedent in other pg_upgrade checks for
trying to fix problems automatically.

            regards, tom lane

Re: pg_upgrade check for invalid databases

From

Thomas Krennwallner

Date:

01 October 2024, 03:35:12

On 30/09/2024 17.29, Daniel Gustafsson wrote:
>> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
>> TBH I'm not finding anything very much wrong with the current
>> behavior... this has to be a rare situation, do we need to add
>> debatable behavior to make it easier?
> 
> One argument would be to make the checks consistent, pg_upgrade generally tries
> to report all the offending entries to help the user when fixing the source
> database.  Not sure if it's a strong enough argument for carrying code which
> really shouldn't see much use though.
In general, I agree that this situation should be rare for deliberate 
DROP DATABASE interrupted in interactive sessions.

Unfortunately, for (popular) tools that perform automatic "temporary 
database" cleanup, we could recently see an increase in invalid databases.

The additional check for pg_upgrade was made necessary due to several 
unrelated customers having invalid databases that stem from left-over 
Prisma Migrate "shadow databases" [1]. We could not reproduce this 
Prisma Migrate issue yet, as those migrations happened some time ago. 
Maybe this bug really stems from a much older Prisma Migrate version and 
we only see the fallout now. This is still a TODO item.

But it appears that this tool can get interrupted "at the wrong time" 
while it is deleting temporary databases (probably a manual Ctrl-C), and 
clients are unaware that this can then leave behind invalid databases.

Those temporary databases do not cause any harm as they are not used 
anymore. But eventually, PG installations will be upgraded to the next 
major version, and it is only then when those invalid databases 
resurface after pg_upgrade fails to run the checks.

Long story short: interactive DROP DATABASE interrupts are rare (they do 
exist, but customers are usually aware). Automation tools on the other 
hand may run DROP DATABASE and when they get interrupted at the wrong 
time they will then produce several left-over invalid databases. 
pg_upgrade will then fail to run the checks.

[1] 
https://www.prisma.io/docs/orm/prisma-migrate/understanding-prisma-migrate/shadow-database

Re: pg_upgrade check for invalid databases

From

Daniel Gustafsson

Date:

01 October 2024, 10:28:54

> On 1 Oct 2024, at 00:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>>> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> TBH I'm not finding anything very much wrong with the current
>>> behavior... this has to be a rare situation, do we need to add
>>> debatable behavior to make it easier?
>
>> One argument would be to make the checks consistent, pg_upgrade generally tries
>> to report all the offending entries to help the user when fixing the source
>> database.  Not sure if it's a strong enough argument for carrying code which
>> really shouldn't see much use though.
>
> OK, but the consistency argument would be to just report and fail.
> I don't think there's a precedent in other pg_upgrade checks for
> trying to fix problems automatically.

Correct, sorry for being unclear.  The consistency argument would be to expand
pg_upgrade to report all invalid databases rather than just the first found;
attempting to fix problems would be a new behavior.

--
Daniel Gustafsson

Re: pg_upgrade check for invalid databases

From

Daniel Gustafsson

Date:

01 October 2024, 10:42:22

> On 1 Oct 2024, at 02:35, Thomas Krennwallner <tk@postsubmeta.net> wrote:
>
> On 30/09/2024 17.29, Daniel Gustafsson wrote:
>>> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> TBH I'm not finding anything very much wrong with the current
>>> behavior... this has to be a rare situation, do we need to add
>>> debatable behavior to make it easier?
>> One argument would be to make the checks consistent, pg_upgrade generally tries
>> to report all the offending entries to help the user when fixing the source
>> database.  Not sure if it's a strong enough argument for carrying code which
>> really shouldn't see much use though.
> In general, I agree that this situation should be rare for deliberate DROP DATABASE interrupted in interactive
sessions.
>
> Unfortunately, for (popular) tools that perform automatic "temporary database" cleanup, we could recently see an
increasein invalid databases. 
>
> The additional check for pg_upgrade was made necessary due to several unrelated customers having invalid databases
thatstem from left-over Prisma Migrate "shadow databases" [1]. We could not reproduce this Prisma Migrate issue yet, as
thosemigrations happened some time ago. Maybe this bug really stems from a much older Prisma Migrate version and we
onlysee the fallout now. This is still a TODO item. 
>
> But it appears that this tool can get interrupted "at the wrong time" while it is deleting temporary databases
(probablya manual Ctrl-C), and clients are unaware that this can then leave behind invalid databases. 
>
> Those temporary databases do not cause any harm as they are not used anymore. But eventually, PG installations will
beupgraded to the next major version, and it is only then when those invalid databases resurface after pg_upgrade fails
torun the checks. 

Databases containing transient data no longer needed left by buggy tools is one
thing, but pg_upgrade won't be able to differentiate between those and invalid
databases of legitimate interest.  Allowing pg_upgrade to skip invalid
databases expose the risk of (potentially) valuable data being dropped during
the upgrade due to the user not having realized a rarely-used production
database was invalid.

> Long story short: interactive DROP DATABASE interrupts are rare (they do exist, but customers are usually aware).
Automationtools on the other hand may run DROP DATABASE and when they get interrupted at the wrong time they will then
produceseveral left-over invalid databases. pg_upgrade will then fail to run the checks. 

Checking and reporting all invalid databases during the check phase seems like
a user-friendly option here, I can agree that the current behaviour isn't great
for users experiencing this issue.

--
Daniel Gustafsson

Re: pg_upgrade check for invalid databases

From

Bruce Momjian

Date:

07 October 2024, 22:37:35

On Tue, Oct  1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
> > On 1 Oct 2024, at 00:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > 
> > Daniel Gustafsson <daniel@yesql.se> writes:
> >>> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> TBH I'm not finding anything very much wrong with the current
> >>> behavior... this has to be a rare situation, do we need to add
> >>> debatable behavior to make it easier?
> > 
> >> One argument would be to make the checks consistent, pg_upgrade generally tries
> >> to report all the offending entries to help the user when fixing the source
> >> database.  Not sure if it's a strong enough argument for carrying code which
> >> really shouldn't see much use though.
> > 
> > OK, but the consistency argument would be to just report and fail.
> > I don't think there's a precedent in other pg_upgrade checks for
> > trying to fix problems automatically.
> 
> Correct, sorry for being unclear.  The consistency argument would be to expand
> pg_upgrade to report all invalid databases rather than just the first found;
> attempting to fix problems would be a new behavior.

Yes, historically pg_upgrade will fail if it finds anything unusual,
mostly because what it does normally is already scary enough.  If users
what pg_upgrade to do cleanups, it would be enabled by a separate flag,
or even a new command-line app.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: pg_upgrade check for invalid databases

From

Nathan Bossart

Date:

07 October 2024, 23:04:22

On Mon, Oct 07, 2024 at 03:37:35PM -0400, Bruce Momjian wrote:
> On Tue, Oct  1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
>> Correct, sorry for being unclear.  The consistency argument would be to expand
>> pg_upgrade to report all invalid databases rather than just the first found;
>> attempting to fix problems would be a new behavior.
> 
> Yes, historically pg_upgrade will fail if it finds anything unusual,
> mostly because what it does normally is already scary enough.  If users
> what pg_upgrade to do cleanups, it would be enabled by a separate flag,
> or even a new command-line app.

While I suspect it's rare that someone CTRL-C's out of an accidental DROP
DATABASE and then runs pg_upgrade before trying to recover the data, I
agree with the principle of having pg_upgrade fail by default for things
like this.  If we did add a new flag, the new invalid database report that
Daniel mentions could say something like "try again with
--skip-invalid-databases to have pg_upgrade automatically drop invalid
databases."

-- 
nathan

Re: pg_upgrade check for invalid databases

From

Daniel Gustafsson

Date:

11 October 2024, 11:00:49

> On 7 Oct 2024, at 22:04, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Mon, Oct 07, 2024 at 03:37:35PM -0400, Bruce Momjian wrote:
>> On Tue, Oct  1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
>>> Correct, sorry for being unclear.  The consistency argument would be to expand
>>> pg_upgrade to report all invalid databases rather than just the first found;
>>> attempting to fix problems would be a new behavior.
>>
>> Yes, historically pg_upgrade will fail if it finds anything unusual,
>> mostly because what it does normally is already scary enough.  If users
>> what pg_upgrade to do cleanups, it would be enabled by a separate flag,
>> or even a new command-line app.
>
> While I suspect it's rare that someone CTRL-C's out of an accidental DROP
> DATABASE and then runs pg_upgrade before trying to recover the data, I
> agree with the principle of having pg_upgrade fail by default for things
> like this.  If we did add a new flag, the new invalid database report that
> Daniel mentions could say something like "try again with
> --skip-invalid-databases to have pg_upgrade automatically drop invalid
> databases."

If we are teaching pg_upgrade to handle errors, either by skipping or by
fixing, then I believe this is the right way to go about it.  A successful run
should probably also create a report of the databases which were skipped.

--
Daniel Gustafsson

Re: pg_upgrade check for invalid databases

From

Bruce Momjian

Date:

14 October 2024, 19:57:20

On Sun, Oct 13, 2024 at 08:28:57AM -0400, Thomas Krennwallner wrote:
> In v2 I've made changes to the patch incorporating the suggestions here:
> 
> * Default behaviour is to just fail with a report of all invalid databases
> 
> * A new option --skip-invalid-databases will then skip the checks, and
> would not transfer any invalid database to the new cluster. A warning
> with a report file will then follow after a successful run.
> 
> Dropping invalid databases in the old cluster will make invalid
> databases unrecoverable, so I opted for a skip over invalid databases
> approach that would leave invalid databases in the old cluster.
> 
> Apart from a missing --skip-invalid-databases test, does this attempt look OK?

I don't think there is enough demand for the feature of skipping invalid
databases because we have gotten few reports of such problems, and also
because your case is related to an external tool causing this problem.

What might be acceptable would be to add an option that would make
pg_upgrade more tolerant of problems in many areas, that is a lot more
research and discussion.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: pg_upgrade check for invalid databases

From

Bruce Momjian

Date:

01 November 2024, 03:36:25

On Fri, Oct 25, 2024 at 01:55:57PM +0200, Daniel Gustafsson wrote:
> > On 14 Oct 2024, at 18:57, Bruce Momjian <bruce@momjian.us> wrote:
> 
> > What might be acceptable would be to add an option that would make
> > pg_upgrade more tolerant of problems in many areas, that is a lot more
> > research and discussion.
> 
> I agree that the concept of having pg_upgrade perform (opt-in) skipping and/or
> repairs of the old cluster warrants a larger discussion in its own thread.
> There has been significant amount of energy spent recently to add structure to
> the checks, any new feature should be properly designed for the get-go.
> 
> In the meantime, the OP has a good point that it's a tad silly that pg_upgrade
> fails hard on invalid databases instead of detecting and reporting like how
> other errors are handled.  The attached adds this check and expands the report
> wording to cover it.

Agreed.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"

Re: pg_upgrade check for invalid databases

From

Daniel Gustafsson

Date:

06 November 2024, 18:03:35

> On 1 Nov 2024, at 01:36, Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, Oct 25, 2024 at 01:55:57PM +0200, Daniel Gustafsson wrote:

>> In the meantime, the OP has a good point that it's a tad silly that pg_upgrade
>> fails hard on invalid databases instead of detecting and reporting like how
>> other errors are handled.  The attached adds this check and expands the report
>> wording to cover it.
>
> Agreed.

I've applied this part, the discussion on whether or not pg_upgrade should gain
capabilities to skip and/or fix issues should probably be carried over in a new
thread.

--
Daniel Gustafsson