Thread: Re: pg_upgrade check for invalid databases

Re: pg_upgrade check for invalid databases

From
Nathan Bossart
Date:
On Sun, Sep 29, 2024 at 08:45:50PM -0400, Thomas Krennwallner wrote:
> if a cluster contains invalid databases that we cannot connect to anymore,
> pg_upgrade would currently fail when trying to connect to the first
> encountered invalid database with
> 
> [...]
> 
> If there is more than one invalid database, we need to run pg_upgrade more
> than once (unless the user inspects pg_database).
> 
> I attached two small patches for PG 17 and PG 18 (can be easily backported
> to all previous versions upon request).  Instead of just failing to connect
> with an error, we collect all invalid databases in a report file
> invalid_databases.txt:

Should we have pg_upgrade skip invalid databases?  If the only valid action
is to drop them, IMHO it seems unnecessary to require users to manually
drop them before retrying pg_upgrade.

-- 
nathan



Re: pg_upgrade check for invalid databases

From
Tom Lane
Date:
Nathan Bossart <nathandbossart@gmail.com> writes:
> Should we have pg_upgrade skip invalid databases?  If the only valid action
> is to drop them, IMHO it seems unnecessary to require users to manually
> drop them before retrying pg_upgrade.

I was thinking the same.  But I wonder if there is any chance of
losing data that could be recoverable.  It feels like this should
not be a default behavior.

TBH I'm not finding anything very much wrong with the current
behavior... this has to be a rare situation, do we need to add
debatable behavior to make it easier?

            regards, tom lane



Re: pg_upgrade check for invalid databases

From
Daniel Gustafsson
Date:
> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> TBH I'm not finding anything very much wrong with the current
> behavior... this has to be a rare situation, do we need to add
> debatable behavior to make it easier?

One argument would be to make the checks consistent, pg_upgrade generally tries
to report all the offending entries to help the user when fixing the source
database.  Not sure if it's a strong enough argument for carrying code which
really shouldn't see much use though.

--
Daniel Gustafsson




Re: pg_upgrade check for invalid databases

From
Tom Lane
Date:
Daniel Gustafsson <daniel@yesql.se> writes:
>> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> TBH I'm not finding anything very much wrong with the current
>> behavior... this has to be a rare situation, do we need to add
>> debatable behavior to make it easier?

> One argument would be to make the checks consistent, pg_upgrade generally tries
> to report all the offending entries to help the user when fixing the source
> database.  Not sure if it's a strong enough argument for carrying code which
> really shouldn't see much use though.

OK, but the consistency argument would be to just report and fail.
I don't think there's a precedent in other pg_upgrade checks for
trying to fix problems automatically.

            regards, tom lane



Re: pg_upgrade check for invalid databases

From
Thomas Krennwallner
Date:
On 30/09/2024 17.29, Daniel Gustafsson wrote:
>> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
>> TBH I'm not finding anything very much wrong with the current
>> behavior... this has to be a rare situation, do we need to add
>> debatable behavior to make it easier?
> 
> One argument would be to make the checks consistent, pg_upgrade generally tries
> to report all the offending entries to help the user when fixing the source
> database.  Not sure if it's a strong enough argument for carrying code which
> really shouldn't see much use though.
In general, I agree that this situation should be rare for deliberate 
DROP DATABASE interrupted in interactive sessions.

Unfortunately, for (popular) tools that perform automatic "temporary 
database" cleanup, we could recently see an increase in invalid databases.

The additional check for pg_upgrade was made necessary due to several 
unrelated customers having invalid databases that stem from left-over 
Prisma Migrate "shadow databases" [1]. We could not reproduce this 
Prisma Migrate issue yet, as those migrations happened some time ago. 
Maybe this bug really stems from a much older Prisma Migrate version and 
we only see the fallout now. This is still a TODO item.

But it appears that this tool can get interrupted "at the wrong time" 
while it is deleting temporary databases (probably a manual Ctrl-C), and 
clients are unaware that this can then leave behind invalid databases.

Those temporary databases do not cause any harm as they are not used 
anymore. But eventually, PG installations will be upgraded to the next 
major version, and it is only then when those invalid databases 
resurface after pg_upgrade fails to run the checks.

Long story short: interactive DROP DATABASE interrupts are rare (they do 
exist, but customers are usually aware). Automation tools on the other 
hand may run DROP DATABASE and when they get interrupted at the wrong 
time they will then produce several left-over invalid databases. 
pg_upgrade will then fail to run the checks.

[1] 
https://www.prisma.io/docs/orm/prisma-migrate/understanding-prisma-migrate/shadow-database





Re: pg_upgrade check for invalid databases

From
Daniel Gustafsson
Date:
> On 1 Oct 2024, at 00:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Daniel Gustafsson <daniel@yesql.se> writes:
>>> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> TBH I'm not finding anything very much wrong with the current
>>> behavior... this has to be a rare situation, do we need to add
>>> debatable behavior to make it easier?
>
>> One argument would be to make the checks consistent, pg_upgrade generally tries
>> to report all the offending entries to help the user when fixing the source
>> database.  Not sure if it's a strong enough argument for carrying code which
>> really shouldn't see much use though.
>
> OK, but the consistency argument would be to just report and fail.
> I don't think there's a precedent in other pg_upgrade checks for
> trying to fix problems automatically.

Correct, sorry for being unclear.  The consistency argument would be to expand
pg_upgrade to report all invalid databases rather than just the first found;
attempting to fix problems would be a new behavior.

--
Daniel Gustafsson




Re: pg_upgrade check for invalid databases

From
Daniel Gustafsson
Date:
> On 1 Oct 2024, at 02:35, Thomas Krennwallner <tk@postsubmeta.net> wrote:
>
> On 30/09/2024 17.29, Daniel Gustafsson wrote:
>>> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> TBH I'm not finding anything very much wrong with the current
>>> behavior... this has to be a rare situation, do we need to add
>>> debatable behavior to make it easier?
>> One argument would be to make the checks consistent, pg_upgrade generally tries
>> to report all the offending entries to help the user when fixing the source
>> database.  Not sure if it's a strong enough argument for carrying code which
>> really shouldn't see much use though.
> In general, I agree that this situation should be rare for deliberate DROP DATABASE interrupted in interactive
sessions.
>
> Unfortunately, for (popular) tools that perform automatic "temporary database" cleanup, we could recently see an
increasein invalid databases. 
>
> The additional check for pg_upgrade was made necessary due to several unrelated customers having invalid databases
thatstem from left-over Prisma Migrate "shadow databases" [1]. We could not reproduce this Prisma Migrate issue yet, as
thosemigrations happened some time ago. Maybe this bug really stems from a much older Prisma Migrate version and we
onlysee the fallout now. This is still a TODO item. 
>
> But it appears that this tool can get interrupted "at the wrong time" while it is deleting temporary databases
(probablya manual Ctrl-C), and clients are unaware that this can then leave behind invalid databases. 
>
> Those temporary databases do not cause any harm as they are not used anymore. But eventually, PG installations will
beupgraded to the next major version, and it is only then when those invalid databases resurface after pg_upgrade fails
torun the checks. 

Databases containing transient data no longer needed left by buggy tools is one
thing, but pg_upgrade won't be able to differentiate between those and invalid
databases of legitimate interest.  Allowing pg_upgrade to skip invalid
databases expose the risk of (potentially) valuable data being dropped during
the upgrade due to the user not having realized a rarely-used production
database was invalid.

> Long story short: interactive DROP DATABASE interrupts are rare (they do exist, but customers are usually aware).
Automationtools on the other hand may run DROP DATABASE and when they get interrupted at the wrong time they will then
produceseveral left-over invalid databases. pg_upgrade will then fail to run the checks. 

Checking and reporting all invalid databases during the check phase seems like
a user-friendly option here, I can agree that the current behaviour isn't great
for users experiencing this issue.

--
Daniel Gustafsson




Re: pg_upgrade check for invalid databases

From
Bruce Momjian
Date:
On Tue, Oct  1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
> > On 1 Oct 2024, at 00:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > 
> > Daniel Gustafsson <daniel@yesql.se> writes:
> >>> On 30 Sep 2024, at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> TBH I'm not finding anything very much wrong with the current
> >>> behavior... this has to be a rare situation, do we need to add
> >>> debatable behavior to make it easier?
> > 
> >> One argument would be to make the checks consistent, pg_upgrade generally tries
> >> to report all the offending entries to help the user when fixing the source
> >> database.  Not sure if it's a strong enough argument for carrying code which
> >> really shouldn't see much use though.
> > 
> > OK, but the consistency argument would be to just report and fail.
> > I don't think there's a precedent in other pg_upgrade checks for
> > trying to fix problems automatically.
> 
> Correct, sorry for being unclear.  The consistency argument would be to expand
> pg_upgrade to report all invalid databases rather than just the first found;
> attempting to fix problems would be a new behavior.

Yes, historically pg_upgrade will fail if it finds anything unusual,
mostly because what it does normally is already scary enough.  If users
what pg_upgrade to do cleanups, it would be enabled by a separate flag,
or even a new command-line app.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"



Re: pg_upgrade check for invalid databases

From
Nathan Bossart
Date:
On Mon, Oct 07, 2024 at 03:37:35PM -0400, Bruce Momjian wrote:
> On Tue, Oct  1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
>> Correct, sorry for being unclear.  The consistency argument would be to expand
>> pg_upgrade to report all invalid databases rather than just the first found;
>> attempting to fix problems would be a new behavior.
> 
> Yes, historically pg_upgrade will fail if it finds anything unusual,
> mostly because what it does normally is already scary enough.  If users
> what pg_upgrade to do cleanups, it would be enabled by a separate flag,
> or even a new command-line app.

While I suspect it's rare that someone CTRL-C's out of an accidental DROP
DATABASE and then runs pg_upgrade before trying to recover the data, I
agree with the principle of having pg_upgrade fail by default for things
like this.  If we did add a new flag, the new invalid database report that
Daniel mentions could say something like "try again with
--skip-invalid-databases to have pg_upgrade automatically drop invalid
databases."

-- 
nathan



Re: pg_upgrade check for invalid databases

From
Daniel Gustafsson
Date:
> On 7 Oct 2024, at 22:04, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Mon, Oct 07, 2024 at 03:37:35PM -0400, Bruce Momjian wrote:
>> On Tue, Oct  1, 2024 at 09:28:54AM +0200, Daniel Gustafsson wrote:
>>> Correct, sorry for being unclear.  The consistency argument would be to expand
>>> pg_upgrade to report all invalid databases rather than just the first found;
>>> attempting to fix problems would be a new behavior.
>>
>> Yes, historically pg_upgrade will fail if it finds anything unusual,
>> mostly because what it does normally is already scary enough.  If users
>> what pg_upgrade to do cleanups, it would be enabled by a separate flag,
>> or even a new command-line app.
>
> While I suspect it's rare that someone CTRL-C's out of an accidental DROP
> DATABASE and then runs pg_upgrade before trying to recover the data, I
> agree with the principle of having pg_upgrade fail by default for things
> like this.  If we did add a new flag, the new invalid database report that
> Daniel mentions could say something like "try again with
> --skip-invalid-databases to have pg_upgrade automatically drop invalid
> databases."

If we are teaching pg_upgrade to handle errors, either by skipping or by
fixing, then I believe this is the right way to go about it.  A successful run
should probably also create a report of the databases which were skipped.

--
Daniel Gustafsson




Re: pg_upgrade check for invalid databases

From
Bruce Momjian
Date:
On Sun, Oct 13, 2024 at 08:28:57AM -0400, Thomas Krennwallner wrote:
> In v2 I've made changes to the patch incorporating the suggestions here:
> 
> * Default behaviour is to just fail with a report of all invalid databases
> 
> * A new option --skip-invalid-databases will then skip the checks, and
> would not transfer any invalid database to the new cluster. A warning
> with a report file will then follow after a successful run.
> 
> Dropping invalid databases in the old cluster will make invalid
> databases unrecoverable, so I opted for a skip over invalid databases
> approach that would leave invalid databases in the old cluster.
> 
> Apart from a missing --skip-invalid-databases test, does this attempt look OK?

I don't think there is enough demand for the feature of skipping invalid
databases because we have gotten few reports of such problems, and also
because your case is related to an external tool causing this problem.

What might be acceptable would be to add an option that would make
pg_upgrade more tolerant of problems in many areas, that is a lot more
research and discussion.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"



Re: pg_upgrade check for invalid databases

From
Bruce Momjian
Date:
On Fri, Oct 25, 2024 at 01:55:57PM +0200, Daniel Gustafsson wrote:
> > On 14 Oct 2024, at 18:57, Bruce Momjian <bruce@momjian.us> wrote:
> 
> > What might be acceptable would be to add an option that would make
> > pg_upgrade more tolerant of problems in many areas, that is a lot more
> > research and discussion.
> 
> I agree that the concept of having pg_upgrade perform (opt-in) skipping and/or
> repairs of the old cluster warrants a larger discussion in its own thread.
> There has been significant amount of energy spent recently to add structure to
> the checks, any new feature should be properly designed for the get-go.
> 
> In the meantime, the OP has a good point that it's a tad silly that pg_upgrade
> fails hard on invalid databases instead of detecting and reporting like how
> other errors are handled.  The attached adds this check and expands the report
> wording to cover it.

Agreed.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  When a patient asks the doctor, "Am I going to die?", he means 
  "Am I going to die soon?"



Re: pg_upgrade check for invalid databases

From
Daniel Gustafsson
Date:
> On 1 Nov 2024, at 01:36, Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, Oct 25, 2024 at 01:55:57PM +0200, Daniel Gustafsson wrote:

>> In the meantime, the OP has a good point that it's a tad silly that pg_upgrade
>> fails hard on invalid databases instead of detecting and reporting like how
>> other errors are handled.  The attached adds this check and expands the report
>> wording to cover it.
>
> Agreed.

I've applied this part, the discussion on whether or not pg_upgrade should gain
capabilities to skip and/or fix issues should probably be carried over in a new
thread.

--
Daniel Gustafsson