Thread: pg15b2: large objects lost on upgrade

pg15b2: large objects lost on upgrade

From

Justin Pryzby

Date:

01 July 2022, 23:14:13

I noticed this during beta1, but dismissed the issue when it wasn't easily
reproducible.  Now, I saw the same problem while upgrading from beta1 to beta2,
so couldn't dismiss it.  It turns out that LOs are lost if VACUUM FULL was run.

| /usr/pgsql-15b1/bin/initdb --no-sync -D pg15b1.dat -k
| /usr/pgsql-15b1/bin/postgres -D pg15b1.dat -c logging_collector=no -p 5678 -k /tmp&
| psql -h /tmp postgres -p 5678 -c '\lo_import /etc/shells' -c 'VACUUM FULL pg_largeobject'
| rm -fr pg15b2.dat && /usr/pgsql-15b2/bin/initdb --no-sync -k -D pg15b2.dat && /usr/pgsql-15b2/bin/pg_upgrade -d
pg15b1.dat-D pg15b2.dat -b /usr/pgsql-15b1/bin
 
| /usr/pgsql-15b2/bin/postgres -D pg15b2.dat -c logging_collector=no -p 5678 -k /tmp&

Or, for your convenience, with paths in tmp_install:
| ./tmp_install/usr/local/pgsql/bin/initdb --no-sync -D pg15b1.dat -k
| ./tmp_install/usr/local/pgsql/bin/postgres -D pg15b1.dat -c logging_collector=no -p 5678 -k /tmp&
| psql -h /tmp postgres -p 5678 -c '\lo_import /etc/shells' -c 'VACUUM FULL pg_largeobject'
| rm -fr pg15b2.dat && ./tmp_install/usr/local/pgsql/bin/initdb --no-sync -k -D pg15b2.dat &&
./tmp_install/usr/local/pgsql/bin/pg_upgrade-d pg15b1.dat -D pg15b2.dat -b ./tmp_install/usr/local/pgsql/bin
 
| ./tmp_install/usr/local/pgsql/bin/postgres -D pg15b2.dat -c logging_collector=no -p 5678 -k /tmp&

postgres=# table pg_largeobject_metadata ;
 16384 |       10 | 

postgres=# \lo_list 
 16384 | pryzbyj | 

postgres=# \lo_export 16384 /dev/stdout
lo_export
postgres=# table pg_largeobject;

postgres=# \dt+ pg_largeobject
 pg_catalog | pg_largeobject | table | pryzbyj | permanent   | heap          | 0 bytes | 

I reproduced the problem at 9a974cbcba but not its parent commit.

commit 9a974cbcba005256a19991203583a94b4f9a21a9
Author: Robert Haas <rhaas@postgresql.org>
Date:   Mon Jan 17 13:32:44 2022 -0500

    pg_upgrade: Preserve relfilenodes and tablespace OIDs.

-- 
Justin

Re: pg15b2: large objects lost on upgrade

From

Michael Paquier

Date:

02 July 2022, 03:17:17

On Fri, Jul 01, 2022 at 06:14:13PM -0500, Justin Pryzby wrote:
> I reproduced the problem at 9a974cbcba but not its parent commit.
>
> commit 9a974cbcba005256a19991203583a94b4f9a21a9
> Author: Robert Haas <rhaas@postgresql.org>
> Date:   Mon Jan 17 13:32:44 2022 -0500
>
>     pg_upgrade: Preserve relfilenodes and tablespace OIDs.

Oops.  Robert?

This reproduces as well when using pg_upgrade with the same version as
origin and target, meaning that this extra step in the TAP test is
able to reproduce the issue (extra VACUUM FULL before chdir'ing):
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -208,6 +208,8 @@ if (defined($ENV{oldinstall}))
    }
 }

+$oldnode->safe_psql("regression", "VACUUM FULL pg_largeobject;");
+
# In a VPATH build, we'll be started in the source directory, but we want
--
Michael

Attachment

signature.asc

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

02 July 2022, 12:34:04

On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I noticed this during beta1, but dismissed the issue when it wasn't easily
> reproducible.  Now, I saw the same problem while upgrading from beta1 to beta2,
> so couldn't dismiss it.  It turns out that LOs are lost if VACUUM FULL was run.

Yikes. That's really bad, and I have no idea what might be causing it,
either. I'll plan to investigate this on Tuesday unless someone gets
to it before then.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Justin Pryzby

Date:

02 July 2022, 15:49:17

On Sat, Jul 02, 2022 at 08:34:04AM -0400, Robert Haas wrote:
> On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I noticed this during beta1, but dismissed the issue when it wasn't easily
> > reproducible.  Now, I saw the same problem while upgrading from beta1 to beta2,
> > so couldn't dismiss it.  It turns out that LOs are lost if VACUUM FULL was run.
> 
> Yikes. That's really bad, and I have no idea what might be causing it,
> either. I'll plan to investigate this on Tuesday unless someone gets
> to it before then.

I suppose it's like Bruce said, here.

https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us

|One tricky case is pg_largeobject, which is copied from the old to new
|cluster since it has user data.  To preserve that relfilenode, you would
|need to have pg_upgrade perform cluster surgery in each database to
|renumber its relfilenode to match since it is created by initdb.  I
|can't think of a case where pg_upgrade already does something like that.

Rather than setting the filenode of the next object as for user tables,
pg-upgrade needs to UPDATE the relfilenode.

This patch "works for me" but feel free to improve on it.

Attachment

0001-wip-preserve-relfilenode-of-pg_largeobject-and-its-i.patch

Re: pg15b2: large objects lost on upgrade

From

Julien Rouhaud

Date:

02 July 2022, 15:52:08

On Sat, Jul 02, 2022 at 08:34:04AM -0400, Robert Haas wrote:
> On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I noticed this during beta1, but dismissed the issue when it wasn't easily
> > reproducible.  Now, I saw the same problem while upgrading from beta1 to beta2,
> > so couldn't dismiss it.  It turns out that LOs are lost if VACUUM FULL was run.
> 
> Yikes. That's really bad, and I have no idea what might be causing it,
> either. I'll plan to investigate this on Tuesday unless someone gets
> to it before then.

As far as I can see the data is still there, it's just that the new cluster
keeps its default relfilenode instead of preserving the old cluster's value:

regression=# table pg_largeobject;
 loid | pageno | data
------+--------+------
(0 rows)

regression=# select oid, relfilenode from pg_class where relname = 'pg_largeobject';
 oid  | relfilenode
------+-------------
 2613 |        2613
(1 row)

-- using the value from the old cluster
regression=# update pg_class set relfilenode = 39909 where oid = 2613;
UPDATE 1

regression=# table pg_largeobject;
 loid  | pageno |
-------+--------+-----------------
 33211 |      0 | \x0a4920776[...]
 34356 |      0 | \xdeadbeef
(2 rows)

Re: pg15b2: large objects lost on upgrade

From

Shruthi Gowda

Date:

05 July 2022, 11:33:24

I was able to reproduce the issue. Also, the issue does not occur with code before to preserve relfilenode commit.

I tested your patch and it fixes the problem.

I am currently analyzing a few things related to the issue. I will come back once my analysis is completed.

On Sat, Jul 2, 2022 at 9:19 PM Justin Pryzby <pryzby@telsasoft.com> wrote:

On Sat, Jul 02, 2022 at 08:34:04AM -0400, Robert Haas wrote:
> On Fri, Jul 1, 2022 at 7:14 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I noticed this during beta1, but dismissed the issue when it wasn't easily
> > reproducible. Now, I saw the same problem while upgrading from beta1 to beta2,
> > so couldn't dismiss it. It turns out that LOs are lost if VACUUM FULL was run.
>
> Yikes. That's really bad, and I have no idea what might be causing it,
> either. I'll plan to investigate this on Tuesday unless someone gets
> to it before then.

I suppose it's like Bruce said, here.

https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us

|One tricky case is pg_largeobject, which is copied from the old to new
|cluster since it has user data. To preserve that relfilenode, you would
|need to have pg_upgrade perform cluster surgery in each database to
|renumber its relfilenode to match since it is created by initdb. I
|can't think of a case where pg_upgrade already does something like that.

Rather than setting the filenode of the next object as for user tables,
pg-upgrade needs to UPDATE the relfilenode.

This patch "works for me" but feel free to improve on it.

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

05 July 2022, 16:43:54

On Sat, Jul 2, 2022 at 11:49 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I suppose it's like Bruce said, here.
>
> https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us

Well, I feel dumb. I remember reading that email back when Bruce sent
it, but it seems that it slipped out of my head between then and when
I committed. I think your patch is fine, except that I think maybe we
should adjust this dump comment:

-- For binary upgrade, set pg_largeobject relfrozenxid, relminmxid and
relfilenode

Perhaps:

-- For binary upgrade, preserve values for pg_largeobject and its index

Listing the exact properties preserved seems less important to me than
mentioning that the second UPDATE statement is for its index --
because if you look at the SQL that's generated, you can see what's
being preserved, but you don't automatically know why there are two
UPDATE statements or what the rows with those OIDs are.

I had a moment of panic this morning where I thought maybe the whole
patch needed to be reverted. I was worried that we might need to
preserve the OID of every system table and index. Otherwise, what
happens if the OID counter in the old cluster wraps around and some
user-created object gets an OID that the system tables are using in
the new cluster? However, I think this can't happen, because when the
OID counter wraps around, it wraps around to 16384, and the
relfilenode values for newly created system tables and indexes are all
less than 16384. So I believe we only need to fix pg_largeobject and
its index, and I think your patch does that.

Anyone else see it differently?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Justin Pryzby

Date:

05 July 2022, 16:56:05

On Tue, Jul 05, 2022 at 12:43:54PM -0400, Robert Haas wrote:
> On Sat, Jul 2, 2022 at 11:49 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I suppose it's like Bruce said, here.
> >
> > https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us
> 
> Well, I feel dumb. I remember reading that email back when Bruce sent
> it, but it seems that it slipped out of my head between then and when
> I committed. I think your patch is fine, except that I think maybe we

My patch also leaves a 0 byte file around from initdb, which is harmless, but
dirty.

I've seen before where a bunch of 0 byte files are abandoned in an
otherwise-empty tablespace, with no associated relation, and I have to "rm"
them to be able to drop the tablespace.  Maybe that's a known issue, maybe it's
due to crashes or other edge case, maybe it's of no consequence, and maybe it's
already been fixed or being fixed already.  But it'd be nice to avoid another
way to have a 0 byte files - especially ones named with system OIDs.

> Listing the exact properties preserved seems less important to me than
> mentioning that the second UPDATE statement is for its index --

+1

-- 
Justin

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

05 July 2022, 18:40:21

On Tue, Jul 5, 2022 at 12:56 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> My patch also leaves a 0 byte file around from initdb, which is harmless, but
> dirty.
>
> I've seen before where a bunch of 0 byte files are abandoned in an
> otherwise-empty tablespace, with no associated relation, and I have to "rm"
> them to be able to drop the tablespace.  Maybe that's a known issue, maybe it's
> due to crashes or other edge case, maybe it's of no consequence, and maybe it's
> already been fixed or being fixed already.  But it'd be nice to avoid another
> way to have a 0 byte files - especially ones named with system OIDs.

Do you want to add something to the patch to have pg_upgrade remove
the stray file?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Justin Pryzby

Date:

06 July 2022, 11:56:00

On Tue, Jul 05, 2022 at 02:40:21PM -0400, Robert Haas wrote:
> On Tue, Jul 5, 2022 at 12:56 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > My patch also leaves a 0 byte file around from initdb, which is harmless, but
> > dirty.
> >
> > I've seen before where a bunch of 0 byte files are abandoned in an
> > otherwise-empty tablespace, with no associated relation, and I have to "rm"
> > them to be able to drop the tablespace.  Maybe that's a known issue, maybe it's
> > due to crashes or other edge case, maybe it's of no consequence, and maybe it's
> > already been fixed or being fixed already.  But it'd be nice to avoid another
> > way to have a 0 byte files - especially ones named with system OIDs.
> 
> Do you want to add something to the patch to have pg_upgrade remove
> the stray file?

I'm looking into it, but it'd help to hear suggestions about where to put it.
My current ideas aren't very good.

-- 
Justin

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

06 July 2022, 12:25:04

On Wed, Jul 6, 2022 at 7:56 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I'm looking into it, but it'd help to hear suggestions about where to put it.
> My current ideas aren't very good.

In main() there is a comment that begins "Most failures happen in
create_new_objects(), which has just completed at this point." I am
thinking you might want to insert a new function call just before that
comment, like remove_orphaned_files() or tidy_up_new_cluster().

Another option could be to do something at the beginning of
transfer_all_new_tablespaces().

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Justin Pryzby

Date:

07 July 2022, 17:10:19

On Wed, Jul 06, 2022 at 08:25:04AM -0400, Robert Haas wrote:
> On Wed, Jul 6, 2022 at 7:56 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I'm looking into it, but it'd help to hear suggestions about where to put it.
> > My current ideas aren't very good.
> 
> In main() there is a comment that begins "Most failures happen in
> create_new_objects(), which has just completed at this point." I am
> thinking you might want to insert a new function call just before that
> comment, like remove_orphaned_files() or tidy_up_new_cluster().
> 
> Another option could be to do something at the beginning of
> transfer_all_new_tablespaces().

That seems like the better option, since it has access to the custer's
filenodes.

I checked upgrades from 9.2, upgrades with/out vacuum full, and upgrades with a
DB tablespace.

Maybe it's a good idea to check that the file is empty before unlinking...

-- 
Justin

Attachment

0001-fix-Preserve-relfilenodes-for-pg_largeobject-and-its.patch

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

07 July 2022, 17:38:44

On Thu, Jul 7, 2022 at 1:10 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Maybe it's a good idea to check that the file is empty before unlinking...

If we want to verify that there are no large objects in the cluster,
we could do that in check_new_cluster_is_empty(). However, even if
there aren't, the length of the file could still be more than 0, if
there were some large objects previously and then they were removed.
So it's not entirely obvious to me that we should refuse to remove a
non-empty file.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Bruce Momjian

Date:

07 July 2022, 18:24:04

On Thu, Jul  7, 2022 at 01:38:44PM -0400, Robert Haas wrote:
> On Thu, Jul 7, 2022 at 1:10 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > Maybe it's a good idea to check that the file is empty before unlinking...
> 
> If we want to verify that there are no large objects in the cluster,
> we could do that in check_new_cluster_is_empty(). However, even if
> there aren't, the length of the file could still be more than 0, if
> there were some large objects previously and then they were removed.
> So it's not entirely obvious to me that we should refuse to remove a
> non-empty file.

Uh, that initdb-created pg_largeobject file should not have any data in
it ever, as far as I know at that point in pg_upgrade.  How would values
have gotten in there?  Via pg_dump?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

07 July 2022, 18:38:44

On Thu, Jul 7, 2022 at 2:24 PM Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Jul  7, 2022 at 01:38:44PM -0400, Robert Haas wrote:
> > On Thu, Jul 7, 2022 at 1:10 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > Maybe it's a good idea to check that the file is empty before unlinking...
> >
> > If we want to verify that there are no large objects in the cluster,
> > we could do that in check_new_cluster_is_empty(). However, even if
> > there aren't, the length of the file could still be more than 0, if
> > there were some large objects previously and then they were removed.
> > So it's not entirely obvious to me that we should refuse to remove a
> > non-empty file.
>
> Uh, that initdb-created pg_largeobject file should not have any data in
> it ever, as far as I know at that point in pg_upgrade.  How would values
> have gotten in there?  Via pg_dump?

I was thinking if the user had done it manually before running pg_upgrade.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Justin Pryzby

Date:

07 July 2022, 18:44:10

On Thu, Jul 07, 2022 at 02:38:44PM -0400, Robert Haas wrote:
> On Thu, Jul 7, 2022 at 2:24 PM Bruce Momjian <bruce@momjian.us> wrote:
> > On Thu, Jul  7, 2022 at 01:38:44PM -0400, Robert Haas wrote:
> > > On Thu, Jul 7, 2022 at 1:10 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > Maybe it's a good idea to check that the file is empty before unlinking...
> > >
> > > If we want to verify that there are no large objects in the cluster,
> > > we could do that in check_new_cluster_is_empty(). However, even if
> > > there aren't, the length of the file could still be more than 0, if
> > > there were some large objects previously and then they were removed.
> > > So it's not entirely obvious to me that we should refuse to remove a
> > > non-empty file.
> >
> > Uh, that initdb-created pg_largeobject file should not have any data in
> > it ever, as far as I know at that point in pg_upgrade.  How would values
> > have gotten in there?  Via pg_dump?
> 
> I was thinking if the user had done it manually before running pg_upgrade.

We're referring to the new cluster which should have been initdb'd more or less
immediately before running pg_upgrade [0].

It'd be weird to me if someone were to initdb a new cluster, then create some
large objects, and then maybe delete them, and then run pg_upgrade.  Why
wouldn't they do their work on the old cluster before upgrading, or on the new
cluster afterwards ?

Does anybody actually do anything significant between initdb and pg_upgrade ?
Is that considered to be supported ?  It seems like pg_upgrade could itself run
initdb, with the correct options for locale, checksum, block size, etc
(although it probably has to support the existing way to handle "compatible
encodings").

Actually, I think check_new_cluster_is_empty() ought to prohibit doing work
between initdb and pg_upgrade by checking that no objects have *ever* been
created in the new cluster, by checking that NextOid == 16384.  But I have a
separate thread about "pg-upgrade allows itself to be re-run", and this has
more to do with that than about whether to check that the file is empty before
removing it.

-- 
Justin

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

07 July 2022, 19:05:19

Justin Pryzby <pryzby@telsasoft.com> writes:
> On Thu, Jul 07, 2022 at 02:38:44PM -0400, Robert Haas wrote:
>> On Thu, Jul 7, 2022 at 2:24 PM Bruce Momjian <bruce@momjian.us> wrote:
>>> Uh, that initdb-created pg_largeobject file should not have any data in
>>> it ever, as far as I know at that point in pg_upgrade.  How would values
>>> have gotten in there?  Via pg_dump?

>> I was thinking if the user had done it manually before running pg_upgrade.

> We're referring to the new cluster which should have been initdb'd more or less
> immediately before running pg_upgrade [0].

> It'd be weird to me if someone were to initdb a new cluster, then create some
> large objects, and then maybe delete them, and then run pg_upgrade.

AFAIK you're voiding the warranty if you make any changes at all in the
destination cluster before pg_upgrade'ing.  As an example, if you created
a table there you'd be risking an OID and/or relfilenode collision with
something due to be imported from the source cluster.

> Actually, I think check_new_cluster_is_empty() ought to prohibit doing work
> between initdb and pg_upgrade by checking that no objects have *ever* been
> created in the new cluster, by checking that NextOid == 16384.

It would be good to have some such check; I'm not sure that that one in
particular is the best option.

> But I have a
> separate thread about "pg-upgrade allows itself to be re-run", and this has
> more to do with that than about whether to check that the file is empty before
> removing it.

Yeah, that's another foot-gun in the same area.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

07 July 2022, 19:11:38

+               /* Keep track of whether a filenode matches the OID */
+               if (maps[mapnum].relfilenumber == LargeObjectRelationId)
+                       *has_lotable = true;
+               if (maps[mapnum].relfilenumber == LargeObjectLOidPNIndexId)
+                       *has_loindex = true;
On Thu, Jul 7, 2022 at 2:44 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> We're referring to the new cluster which should have been initdb'd more or less
> immediately before running pg_upgrade [0].
>
> It'd be weird to me if someone were to initdb a new cluster, then create some
> large objects, and then maybe delete them, and then run pg_upgrade.  Why
> wouldn't they do their work on the old cluster before upgrading, or on the new
> cluster afterwards ?
>
> Does anybody actually do anything significant between initdb and pg_upgrade ?
> Is that considered to be supported ?  It seems like pg_upgrade could itself run
> initdb, with the correct options for locale, checksum, block size, etc
> (although it probably has to support the existing way to handle "compatible
> encodings").
>
> Actually, I think check_new_cluster_is_empty() ought to prohibit doing work
> between initdb and pg_upgrade by checking that no objects have *ever* been
> created in the new cluster, by checking that NextOid == 16384.  But I have a
> separate thread about "pg-upgrade allows itself to be re-run", and this has
> more to do with that than about whether to check that the file is empty before
> removing it.

I think you're getting at a really good point here which is also my
point: we assume that nothing significant has happened between when
the cluster was created and when pg_upgrade is run, but we don't check
it. Either we shouldn't assume it, or we should check it.

So, is such activity ever legitimate? I think there are people doing
it. The motivation is that maybe you have a dump from the old database
that doesn't quite restore on the new version, but by doing something
to the new cluster, you can make it restore. For instance, maybe there
are some functions that used to be part of core and are now only
available in an extension. That's going to make pg_upgrade's
dump-and-restore workflow fail, but if you install that extension onto
the new cluster, perhaps you can work around the problem. It doesn't
have to be an extension, even. Maybe some function in core just got an
extra argument, and you're using it, so the calls to that function
cause dump-and-restore to fail. You might try overloading it in the
new database with the old calling sequence to get things working.

Now, are these kinds of things considered to be supported? Well, I
don't know that we've made any statement about that one way or the
other. Perhaps they are not. But I can see why people want to use
workarounds like this. The alternative is having to dump-and-restore
instead of an in-place upgrade, and that's painfully slow by
comparison. pg_upgrade itself doesn't give you any tools to deal with
this kind of situation, but the process is just loose enough to allow
people to insert their own workarounds, so they do. I'm sure I'd do
the same, in their shoes.

My view on this is that, while we probably don't want to make such
things officially supported, I don't think we should ban it outright,
either. We probably can't support an upgrade after the next cluster
has been subjected to arbitrary amounts of tinkering, but we're making
a mistake if we write code that has fragile assumptions for no really
good reason. I think we can do better than this excerpt from your
patch, for example:

+               /* Keep track of whether a filenode matches the OID */
+               if (maps[mapnum].relfilenumber == LargeObjectRelationId)
+                       *has_lotable = true;
+               if (maps[mapnum].relfilenumber == LargeObjectLOidPNIndexId)
+                       *has_loindex = true;

I spent a while struggling to understand this because it seems to me
that every database has an LO table and an LO index, so what's up with
these names? I think what these names are really tracking is whether
the relfilenumber of pg_largeobject and its index in the old database
had their default values. But this makes the assumption that the LO
table and LO index in the new database have never been subjected to
VACUUM FULL or CLUSTER and, while there's no real reason to do that, I
can't quite see what the point of such an unnecessary and fragile
assumption might be. If Dilip's patch to make relfilenodes 56 bits
gets committed, this is going to break anyway, because with that
patch, the relfilenode for a table or index doesn't start out being
equal to its OID.

Perhaps a better solution to this particular problem is to remove the
backing files for the large object table and index *before* restoring
the dump, deciding what files to remove by asking the running server
for the file path. It might seem funny to allow for dangling pg_class
entries, but we're going to create that situation for all other user
rels anyway, and pg_upgrade treats pg_largeobject as a user rel.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Bruce Momjian

Date:

07 July 2022, 20:15:55

On Tue, Jul  5, 2022 at 12:43:54PM -0400, Robert Haas wrote:
> On Sat, Jul 2, 2022 at 11:49 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I suppose it's like Bruce said, here.
> >
> > https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us
> 
> Well, I feel dumb. I remember reading that email back when Bruce sent
> it, but it seems that it slipped out of my head between then and when
> I committed. I think your patch is fine, except that I think maybe we

It happens to us all.

> I had a moment of panic this morning where I thought maybe the whole

Yes, I have had those panics too.

> patch needed to be reverted. I was worried that we might need to
> preserve the OID of every system table and index. Otherwise, what
> happens if the OID counter in the old cluster wraps around and some
> user-created object gets an OID that the system tables are using in
> the new cluster? However, I think this can't happen, because when the
> OID counter wraps around, it wraps around to 16384, and the
> relfilenode values for newly created system tables and indexes are all
> less than 16384. So I believe we only need to fix pg_largeobject and
> its index, and I think your patch does that.

So, let me explain how I look at this.  There are two number-spaces, oid
and relfilenode.  In each number-space, there are system-assigned ones
less than 16384, and higher ones for post-initdb use.

What we did in pre-PG15 was to preserve only oids, and have the
relfilenode match the oid, and we have discussed the negatives of this.

For PG 15+, we preserve relfilenodes too.  These number assignment cases
only work if we handle _all_ numbering, except for non-pg_largeobject
system tables.

In pre-PG15, pg_largeobject was easily handled because initdb already
assigned the oid and relfilenode to be the same for pg_largeobject, so a
simple copy worked fine.  pg_largeobject is an anomaly in PG 15 because
it is assigned a relfilenode in the system number space by initdb, but
then it needs to be potentially renamed into the relfilenode user number
space.  This is the basis for my email as already posted:

    https://www.postgresql.org/message-id/20210601140949.GC22012%40momjian.us

You are right to be concerned since you are spanning number spaces, but
I think you are fine because the relfilenode in the user-space cannot
have been used since it already was being used in each database.  It is
true we never had a per-database rename like this before.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

08 July 2022, 14:44:07

On Thu, Jul 7, 2022 at 4:16 PM Bruce Momjian <bruce@momjian.us> wrote:
> You are right to be concerned since you are spanning number spaces, but
> I think you are fine because the relfilenode in the user-space cannot
> have been used since it already was being used in each database.  It is
> true we never had a per-database rename like this before.

Thanks for checking over the reasoning, and the kind words in general.
I just committed Justin's fix for the bug, without fixing the fact
that the new cluster's original pg_largeobject files will be left
orphaned afterward. That's a relatively minor problem by comparison,
and it seemed best to me not to wait too long to get the main issue
addressed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Justin Pryzby

Date:

08 July 2022, 15:53:36

On Thu, Jul 07, 2022 at 03:11:38PM -0400, Robert Haas wrote:
> point: we assume that nothing significant has happened between when
> the cluster was created and when pg_upgrade is run, but we don't check
> it. Either we shouldn't assume it, or we should check it.
> 
> So, is such activity ever legitimate? I think there are people doing
> it. The motivation is that maybe you have a dump from the old database
> that doesn't quite restore on the new version, but by doing something
> to the new cluster, you can make it restore. For instance, maybe there
> are some functions that used to be part of core and are now only
> available in an extension. That's going to make pg_upgrade's
> dump-and-restore workflow fail, but if you install that extension onto
> the new cluster, perhaps you can work around the problem. It doesn't
> have to be an extension, even. Maybe some function in core just got an
> extra argument, and you're using it, so the calls to that function
> cause dump-and-restore to fail. You might try overloading it in the
> new database with the old calling sequence to get things working.

I don't think that's even possible.

pg_upgrade drops template1 and postgres before upgrading:

                 * template1 database will already exist in the target installation,
                 * so tell pg_restore to drop and recreate it; otherwise we would fail
                 * to propagate its database-level properties.

                 * postgres database will already exist in the target installation, so
                 * tell pg_restore to drop and recreate it; otherwise we would fail to
                 * propagate its database-level properties.

For any other DBs, you'd hit an error if, after initdb'ing, you started the new
cluster, connected to it, created a DB (?!) and then tried to upgrade:

    pg_restore: error: could not execute query: ERROR:  database "pryzbyj" already exists

So if people start, connect, and then futz with a cluster before upgrading it,
it must be for global stuff (roles, tablespaces), and not per-DB stuff.
Also, pg_upgrade refuses to run if additional roles are defined...
So I'm not seeing what someone could do on the new cluster.

That supports the idea that it'd be okay to refuse to upgrade anything other
than a pristine cluster.

> Now, are these kinds of things considered to be supported? Well, I
> don't know that we've made any statement about that one way or the
> other. Perhaps they are not. But I can see why people want to use
> workarounds like this. The alternative is having to dump-and-restore
> instead of an in-place upgrade, and that's painfully slow by
> comparison.

The alternative in cases that I know about is to fix the old DB to allow it to
be upgraded.  check.c has a list of the things that aren't upgradable, and The
fixes are some things like ALTER TABLE DROP OIDs.  We just added another one to
handle v14 aggregates (09878cdd4).

> My view on this is that, while we probably don't want to make such
> things officially supported, I don't think we should ban it outright,
> either. We probably can't support an upgrade after the next cluster
> has been subjected to arbitrary amounts of tinkering, but we're making
> a mistake if we write code that has fragile assumptions for no really
> good reason. I think we can do better than this excerpt from your
> patch, for example:
> 
> +               /* Keep track of whether a filenode matches the OID */
> +               if (maps[mapnum].relfilenumber == LargeObjectRelationId)
> +                       *has_lotable = true;
> +               if (maps[mapnum].relfilenumber == LargeObjectLOidPNIndexId)
> +                       *has_loindex = true;
> 
> I spent a while struggling to understand this because it seems to me
> that every database has an LO table and an LO index, so what's up with
> these names?  I think what these names are really tracking is whether
> the relfilenumber of pg_largeobject and its index in the old database
> had their default values. 

Yes, has_lotable means "has a LO table whose filenode matches the OID".
I will solicit suggestions for a better name.

> But this makes the assumption that the LO
> table and LO index in the new database have never been subjected to
> VACUUM FULL or CLUSTER and, while there's no real reason to do that, I
> can't quite see what the point of such an unnecessary and fragile
> assumption might be.

The idea of running initdb, starting the cluster, and connecting to it to run
VACUUM FULL scares me.  Now that I think about it, it might be almost
inconsequential, since the initial DBs are dropped, and the upgrade will fail
if any non-template DB exists.  But .. maybe something exciting happens if you
vacuum full a shared catalog...  Yup.

With my patch:

./tmp_install/usr/local/pgsql/bin/initdb --no-sync -D pg15b1.dat -k
./tmp_install/usr/local/pgsql/bin/postgres -D pg15b1.dat -c logging_collector=no -p 5678 -k /tmp&
postgres=# \lo_import /etc/shells 
postgres=# VACUUM FULL pg_largeobject;
./tmp_install/usr/local/pgsql/bin/initdb --no-sync -D pg15b2.dat -k
./tmp_install/usr/local/pgsql/bin/postgres -D pg15b2.dat -c logging_collector=no -p 5678 -k /tmp&
postgres=# VACUUM FULL pg_database;
tmp_install/usr/local/pgsql/bin/pg_upgrade -D pg15b2.dat -b tmp_install/usr/local/pgsql/bin -d pg15b1.dat

postgres=# SELECT COUNT(1), pg_relation_filenode(oid), array_agg(relname) FROM pg_class WHERE pg_relation_filenode(oid)
ISNOT NULL GROUP BY 2 HAVING COUNT(1)>1 ORDER BY 1 DESC ;
 
 count | pg_relation_filenode |                     array_agg
-------+----------------------+----------------------------------------------------
     2 |                16388 | {pg_toast_1262_index,pg_largeobject_loid_pn_index}

I don't have a deep understanding why the DB hasn't imploded at this point,
maybe related to the filenode map file, but it seems very close to being
catastrophic.

It seems like pg_upgrade should at least check that the new cluster has no
objects with either OID or relfilenodes in the user range..
You could blame my patch, since I the issue is limited to pg_largeobject.

> Perhaps a better solution to this particular problem is to remove the
> backing files for the large object table and index *before* restoring
> the dump, deciding what files to remove by asking the running server
> for the file path. It might seem funny to allow for dangling pg_class
> entries, but we're going to create that situation for all other user
> rels anyway, and pg_upgrade treats pg_largeobject as a user rel.

I'll think about it more later.

-- 
Justin

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

08 July 2022, 17:12:18

On Fri, Jul 8, 2022 at 11:53 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> pg_upgrade drops template1 and postgres before upgrading:

Hmm, but I bet you could fiddle with template0. Indeed what's the
difference between a user fiddling with template0 and me committing a
patch that bumps catversion? If the latter doesn't prevent pg_upgrade
from working when the release comes out, why should the former?

> I don't have a deep understanding why the DB hasn't imploded at this point,
> maybe related to the filenode map file, but it seems very close to being
> catastrophic.

Yeah, good example.

> It seems like pg_upgrade should at least check that the new cluster has no
> objects with either OID or relfilenodes in the user range..

Well... I think it's not quite that simple. There's an argument for
that rule, to be sure, but in some sense it's far too strict. We only
preserve the OIDs of tablespaces, types, enums, roles, and now
relations. So if you create any other type of object in the new
cluster, like say a function, it's totally fine. You could still fail
if the old cluster happens to contain a function with the same
signature, but that's kind of a different issue. An OID collision for
any of the many object types for which OIDs are not preserved is no
problem at all.

But even if we restrict ourselves to talking about an object type for
which OIDs are preserved, it's still not quite that simple. For
example, if I create a relation in the new cluster, its OID or
relfilenode might be the same as a relation that exists in the old
cluster. In such a case, a failure is inevitable. We're definitely in
big trouble, and the question is only whether pg_upgrade will notice.
But it's also possible that, either by planning or by sure dumb luck,
neither the relation nor the OID that I've created in the new cluster
is in use in the old cluster. In such a case, the upgrade can succeed
without breaking anything, or at least nothing other than our sense of
order in the universe.

Without a doubt, there are holes in pg_upgrade's error checking that
need to be plugged, but I think there is room to debate exactly what
size plug we want to use. I can't really say that it's definitely
stupid to use a plug that's definitely big enough to catch all the
scenarios that might break stuff, but I think my own preference would
be to try to craft it so that it isn't too much larger than necessary.
That's partly because I do think there are some scenarios in which
modifying the new cluster might be the easiest way of working around
some problem, but also because, as a matter of principle, I like the
idea of making rules that correspond to the real dangers. If we write
a rule that says essentially "it's no good if there are two relations
sharing a relfilenode," nobody with any understanding of how the
system works can argue that bypassing it is a sensible thing to do,
and probably nobody will even try, because it's so obviously bonkers
to do so. It's a lot less obviously bonkers to try to bypass the
broader prohibition which you suggest should never be bypassed, so
someone may do it, and get themselves in trouble.

Now I'm not saying such a person will get any sympathy from this list.
If for example you #if out the sanity check and hose yourself, people
here are, including me, are going to suggest that you've hosed
yourself and it's not our problem. But ... the world is full of
warnings about problems that aren't really that serious, and sometimes
those have the effect of discouraging people from taking warnings
about very serious problems as seriously as they should. I know that I
no longer panic when the national weather service texts me to say that
there's a tornado or a flash flood in my area. They've just done that
too many times when there was no real issue with which I needed to be
concerned. If I get caught out by a tornado at some point, they're
probably going to say "well that's why you should always take our
warnings seriously," but I'm going to say "well that's why you
shouldn't send spurious warnings."

> > Perhaps a better solution to this particular problem is to remove the
> > backing files for the large object table and index *before* restoring
> > the dump, deciding what files to remove by asking the running server
> > for the file path. It might seem funny to allow for dangling pg_class
> > entries, but we're going to create that situation for all other user
> > rels anyway, and pg_upgrade treats pg_largeobject as a user rel.
>
> I'll think about it more later.

Sounds good.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Michael Paquier

Date:

11 July 2022, 01:31:06

On Fri, Jul 08, 2022 at 10:44:07AM -0400, Robert Haas wrote:
> Thanks for checking over the reasoning, and the kind words in general.

Thanks for fixing the main issue.

> I just committed Justin's fix for the bug, without fixing the fact
> that the new cluster's original pg_largeobject files will be left
> orphaned afterward. That's a relatively minor problem by comparison,
> and it seemed best to me not to wait too long to get the main issue
> addressed.

Hmm.  That would mean that the more LOs a cluster has, the more bloat
there will be in the new cluster once the upgrade is done.  That could
be quite a few gigs worth of data laying around depending on the data
inserted in the source cluster, and we don't have a way to know which
files to remove post-upgrade, do we?
--
Michael

Attachment

signature.asc

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

11 July 2022, 13:16:30

On Sun, Jul 10, 2022 at 9:31 PM Michael Paquier <michael@paquier.xyz> wrote:
> Hmm.  That would mean that the more LOs a cluster has, the more bloat
> there will be in the new cluster once the upgrade is done. That could
> be quite a few gigs worth of data laying around depending on the data
> inserted in the source cluster, and we don't have a way to know which
> files to remove post-upgrade, do we?

The files that are being leaked here are the files backing the
pg_largeobject table and the corresponding index as they existed in
the new cluster just prior to the upgrade. Hopefully, the table is a
zero-length file and the index is just one block, because you're
supposed to use a newly-initdb'd cluster as the target for a
pg_upgrade operation. Now, you don't actually have to do that: as
we've been discussing, there aren't as many sanity checks in this code
as there probably should be. But it would still be surprising to
initdb a new cluster, load gigabytes of large objects into it, and
then use it as the target cluster for a pg_upgrade.

As for whether it's possible to know which files to remove
post-upgrade, that's the same problem as trying to figure out whether
their are orphaned files in any other PostgreSQL cluster. We don't
have a tool for it, but if you're sure that the system is more or less
quiescent - no uncommitted DDL, in particular - you can compare
pg_class.relfilenode to the actual filesystem contents and figure out
what extra stuff is present on the filesystem level.

I am not saying we shouldn't try to fix this up more thoroughly, just
that I think you are overestimating the consequences.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

12 July 2022, 20:51:44

On Mon, Jul 11, 2022 at 9:16 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I am not saying we shouldn't try to fix this up more thoroughly, just
> that I think you are overestimating the consequences.

I spent a bunch of time looking at this today and I have more sympathy
for Justin's previous proposal now. I found it somewhat hacky that he
was relying on the hard-coded value of LargeObjectRelationId and
LargeObjectLOidPNIndexId, but I discovered that it's harder to do
better than I had assumed. Suppose we don't want to compare against a
hard-coded constant but against the value that is actually present
before the dump overwrites the pg_class row's relfilenode. Well, we
can't get that value from the database in question before restoring
the dump, because restoring either the dump creates or recreates the
database in all cases. The CREATE DATABASE command that will be part
of the dump always specifies TEMPLATE template0, so if we want
something other than a hard-coded constant, we need the
pg_class.relfilenode values from template0 for pg_largeobject and
pg_largeobject_loid_pn_index. But we can't connect to that database to
query those values, because it has datallowconn = false. Oops.

I have a few more ideas to try here. It occurs to me that we could fix
this more cleanly if we could get the dump itself to set the
relfilenode for pg_largeobject to the desired value. Right now, it's
just overwriting the relfilenode stored in the catalog without
actually doing anything that would cause a change on disk. But if we
could make it change the relfilenode in a more principled way that
would actually cause an on-disk change, then the orphaned-file problem
would be fixed, because we'd always be installing the new file over
top of the old file. I'm going to investigate how hard it would be to
make that work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Michael Paquier

Date:

13 July 2022, 00:03:31

On Tue, Jul 12, 2022 at 04:51:44PM -0400, Robert Haas wrote:
> I spent a bunch of time looking at this today and I have more sympathy
> for Justin's previous proposal now. I found it somewhat hacky that he
> was relying on the hard-coded value of LargeObjectRelationId and
> LargeObjectLOidPNIndexId, but I discovered that it's harder to do
> better than I had assumed. Suppose we don't want to compare against a
> hard-coded constant but against the value that is actually present
> before the dump overwrites the pg_class row's relfilenode. Well, we
> can't get that value from the database in question before restoring
> the dump, because restoring either the dump creates or recreates the
> database in all cases. The CREATE DATABASE command that will be part
> of the dump always specifies TEMPLATE template0, so if we want
> something other than a hard-coded constant, we need the
> pg_class.relfilenode values from template0 for pg_largeobject and
> pg_largeobject_loid_pn_index. But we can't connect to that database to
> query those values, because it has datallowconn = false. Oops.
>
> I have a few more ideas to try here. It occurs to me that we could fix
> this more cleanly if we could get the dump itself to set the
> relfilenode for pg_largeobject to the desired value. Right now, it's
> just overwriting the relfilenode stored in the catalog without
> actually doing anything that would cause a change on disk. But if we
> could make it change the relfilenode in a more principled way that
> would actually cause an on-disk change, then the orphaned-file problem
> would be fixed, because we'd always be installing the new file over
> top of the old file. I'm going to investigate how hard it would be to
> make that work.

Thanks for all the details here.  This originally sounded like the new
cluster was keeping around some orphaned relation files with the old
LOs still stored in it.  But as that's just the freshly initdb'd
relfilenodes of pg_largeobject, that does not strike me as something
absolutely critical to fix for v15 as orphaned relfilenodes are an
existing problem.  If we finish with a solution rather simple in
design, I'd be fine to stick a fix in REL_15_STABLE, but playing with
this stable branch more than necessary may be risky after beta2.  At
the end, I would be fine to drop the open item now that the main issue
has been fixed.
--
Michael

Attachment

signature.asc

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

18 July 2022, 18:57:40

On Tue, Jul 12, 2022 at 4:51 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I have a few more ideas to try here. It occurs to me that we could fix
> this more cleanly if we could get the dump itself to set the
> relfilenode for pg_largeobject to the desired value. Right now, it's
> just overwriting the relfilenode stored in the catalog without
> actually doing anything that would cause a change on disk. But if we
> could make it change the relfilenode in a more principled way that
> would actually cause an on-disk change, then the orphaned-file problem
> would be fixed, because we'd always be installing the new file over
> top of the old file. I'm going to investigate how hard it would be to
> make that work.

Well, it took a while to figure out how to make that work, but I
believe I've got it now. Attached please find a couple of patches that
should get the job done. They might need a bit of polish, but I think
the basic concepts are sound.

My first thought was to have the dump issue VACUUM FULL pg_largeobject
after first calling binary_upgrade_set_next_heap_relfilenode() and
binary_upgrade_set_next_index_relfilenode(), and have the VACUUM FULL
use the values configured by those calls for the new heap and index
OID. I got this working in standalone testing, only to find that this
doesn't work inside pg_upgrade. The complaint is "ERROR:  VACUUM
cannot run inside a transaction block", and I think that happens
because pg_restore sends the entire TOC entry for a single object to
the server as a single query, and here it contains multiple SQL
commands. That multi-command string ends up being treated like an
implicit transaction block.

So my second thought was to have the dump still call
binary_upgrade_set_next_heap_relfilenode() and
binary_upgrade_set_next_index_relfilenode(), but then afterwards call
TRUNCATE rather than VACUUM FULL. I found that a simple change to
RelationSetNewRelfilenumber() did the trick: I could then easily
change the heap and index relfilenodes for pg_largeobject to any new
values I liked. However, I realized that this approach had a problem:
what if the pg_largeobject relation had never been rewritten in the
old cluster? Then the heap and index relfilenodes would be unchanged.
It's also possible that someone might have run REINDEX in the old
cluster, in which case it might happen that the heap relfilenode was
unchanged, but the index relfilenode had changed. I spent some time
fumbling around with trying to get the non-transactional truncate path
to cover these cases, but the fact that we might need to change the
relfilenode for the index but not the heap makes this approach seem
pretty awful.

But it occurred to me that in the case of a pg_upgrade, we don't
really need to keep the old storage around until commit time. We can
unlink it first, before creating the new storage, and then if the old
and new storage happen to be the same, it still works. I can think of
two possible objections to this way forward. First, it means that the
operation isn't properly transactional. However, if the upgrade fails
at this stage, the new cluster is going to have to be nuked and
recreated anyway, so the fact that things might be in an unclean state
after an ERROR is irrelevant. Second, one might wonder whether such a
fix is really sufficient. For example, what happens if the relfilenode
allocated to pg_largebject in the old cluster is assigned to its index
in the new cluster, and vice versa? To make that work, we would need
to remove all storage for all relfilenodes first, and then recreate
them all afterward. However, I don't think we need to make that work.
If an object in the old cluster has a relfilenode < 16384, then that
should mean it's never been rewritten and therefore its relfilenode in
the new cluster should be the same. The only way this wouldn't be true
is if we shuffled around the initial relfilenode assignments in a new
version of PG so that the same values were used but now for different
objects, which would be a really dumb idea. And on the other hand, if
the object in the old cluster has a relfilenode > 16384, then that
relfilenode value should be unused in the new cluster. If not, the
user has tinkered with the new cluster more than they ought.

So I tried implementing this but I didn't get it quite right the first
time. It's not enough to call smgrdounlinkall() instead of
RelationDropStorage(), because just as RelationDropStorage() does not
actually drop the storage but only schedules it to be dropped later,
so also smgrdounlinkall() does not in fact unlink all, but only some.
It leaves the first segment of the main relation fork around to guard
against the hazards discussed in the header comments for mdunlink().
But those hazards don't really matter here either, because, again, any
failure will necessarily require that the entire new cluster be
destroyed, and also because there shouldn't be any concurrent activity
in the new cluster while pg_upgrade is running. So I adjusted
smgrdounlinkall() to actually remove everything when IsBinaryUpgrade =
true. And then it all seems to work: pg_upgrade of a cluster that has
had a rewrite of pg_largeobject works, and pg_upgrade of a cluster
that has not had such a rewrite works too. Wa-hoo.

As to whether this is a good fix, I think someone could certainly
argue otherwise. This is all a bit grotty. However, I don't find it
all that bad. As long as we're moving files from between one PG
cluster and another using an external tool rather than logic inside
the server itself, I think we're bound to have some hacks someplace to
make it all work. To me, extending them to a few more places to avoid
leaving files behind on disk seems like a good trade-off. Your mileage
may vary.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: pg15b2: large objects lost on upgrade

From

Andres Freund

Date:

18 July 2022, 20:06:54

Hi,

On 2022-07-18 14:57:40 -0400, Robert Haas wrote:
> As to whether this is a good fix, I think someone could certainly
> argue otherwise. This is all a bit grotty. However, I don't find it
> all that bad. As long as we're moving files from between one PG
> cluster and another using an external tool rather than logic inside
> the server itself, I think we're bound to have some hacks someplace to
> make it all work. To me, extending them to a few more places to avoid
> leaving files behind on disk seems like a good trade-off. Your mileage
> may vary.

How about adding a new binary_upgrade_* helper function for this purpose
instead, instead of tying it into truncate?

Greetings,

Andres Freund

Re: pg15b2: large objects lost on upgrade

From

Bruce Momjian

Date:

18 July 2022, 20:09:40

On Mon, Jul 18, 2022 at 02:57:40PM -0400, Robert Haas wrote:
> So I tried implementing this but I didn't get it quite right the first
> time. It's not enough to call smgrdounlinkall() instead of
> RelationDropStorage(), because just as RelationDropStorage() does not
> actually drop the storage but only schedules it to be dropped later,
> so also smgrdounlinkall() does not in fact unlink all, but only some.
> It leaves the first segment of the main relation fork around to guard
> against the hazards discussed in the header comments for mdunlink().
> But those hazards don't really matter here either, because, again, any
> failure will necessarily require that the entire new cluster be
> destroyed, and also because there shouldn't be any concurrent activity
> in the new cluster while pg_upgrade is running. So I adjusted
> smgrdounlinkall() to actually remove everything when IsBinaryUpgrade =
> true. And then it all seems to work: pg_upgrade of a cluster that has
> had a rewrite of pg_largeobject works, and pg_upgrade of a cluster
> that has not had such a rewrite works too. Wa-hoo.

Using the IsBinaryUpgrade flag makes sense to me.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

18 July 2022, 20:28:35

On Mon, Jul 18, 2022 at 4:06 PM Andres Freund <andres@anarazel.de> wrote:
> How about adding a new binary_upgrade_* helper function for this purpose
> instead, instead of tying it into truncate?

I considered that briefly, but it would need to do a lot of things
that TRUNCATE already knows how to do, so it does not seem like a good
idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

26 July 2022, 19:45:11

On Mon, Jul 18, 2022 at 2:57 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Well, it took a while to figure out how to make that work, but I
> believe I've got it now. Attached please find a couple of patches that
> should get the job done. They might need a bit of polish, but I think
> the basic concepts are sound.

So, would people like these patches (1) committed to master only, (2)
committed to master and back-patched into v15, or (3) not committed at
all? Michael argued upthread that it was too risky to be tinkering
with things at this stage in the release cycle and, certainly, the
more time goes by, the more true that gets. But I'm not convinced that
these patches involve an inordinate degree of risk, and using beta as
a time to fix things that turned out to be buggy is part of the point
of the whole thing. On the other hand, the underlying issue isn't that
serious either, and nobody seems to have reviewed the patches in
detail, either. I don't mind committing them on my own recognizance if
that's what people would prefer; I can take responsibility for fixing
anything that is further broken, as I suppose would be expected even
if someone else did review. But, I don't want to do something that
other people feel is the wrong thing to have done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Bruce Momjian

Date:

27 July 2022, 01:09:22

On Tue, Jul 26, 2022 at 03:45:11PM -0400, Robert Haas wrote:
> On Mon, Jul 18, 2022 at 2:57 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > Well, it took a while to figure out how to make that work, but I
> > believe I've got it now. Attached please find a couple of patches that
> > should get the job done. They might need a bit of polish, but I think
> > the basic concepts are sound.
> 
> So, would people like these patches (1) committed to master only, (2)
> committed to master and back-patched into v15, or (3) not committed at
> all? Michael argued upthread that it was too risky to be tinkering
> with things at this stage in the release cycle and, certainly, the
> more time goes by, the more true that gets. But I'm not convinced that
> these patches involve an inordinate degree of risk, and using beta as
> a time to fix things that turned out to be buggy is part of the point
> of the whole thing. On the other hand, the underlying issue isn't that
> serious either, and nobody seems to have reviewed the patches in
> detail, either. I don't mind committing them on my own recognizance if
> that's what people would prefer; I can take responsibility for fixing
> anything that is further broken, as I suppose would be expected even
> if someone else did review. But, I don't want to do something that
> other people feel is the wrong thing to have done.

This behavior is new in PG 15, and I would be worried to have one new
behavior in PG 15 and another one in PG 16.  Therefore, I would like to
see it in PG 15 and master.  I also think not doing anything and leaving
these zero-length files around would also be risky.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

28 July 2022, 20:16:34

On Tue, Jul 26, 2022 at 9:09 PM Bruce Momjian <bruce@momjian.us> wrote:
> This behavior is new in PG 15, and I would be worried to have one new
> behavior in PG 15 and another one in PG 16.  Therefore, I would like to
> see it in PG 15 and master.

That's also my preference, so committed and back-patched to v15.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

29 July 2022, 17:49:12

Robert Haas <robertmhaas@gmail.com> writes:
> That's also my preference, so committed and back-patched to v15.

crake has been failing its cross-version-upgrade tests [1] since
this went in:

log files for step xversion-upgrade-REL9_4_STABLE-HEAD:
==~_~===-=-===~_~== /home/andrew/bf/root/upgrade.crake/HEAD/REL9_4_STABLE-amcheck-1.log ==~_~===-=-===~_~==
heap table "regression.pg_catalog.pg_largeobject", block 0, offset 7:
    xmin 7707 precedes relation freeze threshold 0:14779
heap table "regression.pg_catalog.pg_largeobject", block 201, offset 5:
    xmin 8633 precedes relation freeze threshold 0:14779

I'm not very sure what to make of that, but it's failed identically
four times in four attempts.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2022-07-28%2020%3A33%3A20

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

29 July 2022, 18:35:03

On Fri, Jul 29, 2022 at 1:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> crake has been failing its cross-version-upgrade tests [1] since
> this went in:
>
> log files for step xversion-upgrade-REL9_4_STABLE-HEAD:
> ==~_~===-=-===~_~== /home/andrew/bf/root/upgrade.crake/HEAD/REL9_4_STABLE-amcheck-1.log ==~_~===-=-===~_~==
> heap table "regression.pg_catalog.pg_largeobject", block 0, offset 7:
>     xmin 7707 precedes relation freeze threshold 0:14779
> heap table "regression.pg_catalog.pg_largeobject", block 201, offset 5:
>     xmin 8633 precedes relation freeze threshold 0:14779
>
> I'm not very sure what to make of that, but it's failed identically
> four times in four attempts.

That's complaining about two tuples in the pg_largeobject table with
xmin values that precedes relfrozenxid -- which suggests that even
after 80d6907219, relfrozenxid isn't being correctly preserved in this
test case, since the last run still failed the same way.

But what exactly is this test case testing? I've previously complained
about buildfarm outputs not being as labelled as well as they need to
be in order to be easily understood by, well, me anyway. It'd sure
help if the commands that led up to this problem were included in the
output. I downloaded latest-client.tgz from the build farm server and
am looking at TestUpgradeXversion.pm, but there's no mention of
-amcheck-1.log in there, just -analyse.log, -copy.log, and following.
So I suppose this is running some different code or special
configuration...

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

29 July 2022, 19:10:09

On Fri, Jul 29, 2022 at 2:35 PM Robert Haas <robertmhaas@gmail.com> wrote:
> But what exactly is this test case testing? I've previously complained
> about buildfarm outputs not being as labelled as well as they need to
> be in order to be easily understood by, well, me anyway. It'd sure
> help if the commands that led up to this problem were included in the
> output. I downloaded latest-client.tgz from the build farm server and
> am looking at TestUpgradeXversion.pm, but there's no mention of
> -amcheck-1.log in there, just -analyse.log, -copy.log, and following.
> So I suppose this is running some different code or special
> configuration...

I was able to reproduce the problem by running 'make installcheck'
against a 9.4 instance and then doing a pg_upgrade to 16devel (which
took many tries because it told me about many different kinds of
things that it didn't like one at a time; I just dropped objects from
the regression DB until it worked). The dump output looks like this:

-- For binary upgrade, set pg_largeobject relfrozenxid and relminmxid
UPDATE pg_catalog.pg_class
SET relfrozenxid = '0', relminmxid = '0'
WHERE oid = 2683;
UPDATE pg_catalog.pg_class
SET relfrozenxid = '990', relminmxid = '1'
WHERE oid = 2613;

-- For binary upgrade, preserve pg_largeobject and index relfilenodes
SELECT pg_catalog.binary_upgrade_set_next_index_relfilenode('12364'::pg_catalog.oid);
SELECT pg_catalog.binary_upgrade_set_next_heap_relfilenode('12362'::pg_catalog.oid);
TRUNCATE pg_catalog.pg_largeobject;

However, the catalogs show the relfilenode being correct, and the
relfrozenxid set to a larger value. I suspect the problem here is that
this needs to be done in the other order, with the TRUNCATE first and
the update to the pg_class columns afterward.

I think I better look into improving the TAP tests for this, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

29 July 2022, 19:51:33

On Fri, Jul 29, 2022 at 3:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
> However, the catalogs show the relfilenode being correct, and the
> relfrozenxid set to a larger value. I suspect the problem here is that
> this needs to be done in the other order, with the TRUNCATE first and
> the update to the pg_class columns afterward.

That fix appears to be correct. Patch attached.

> I think I better look into improving the TAP tests for this, too.

TAP test enhancement also included in the attached patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

v1-0001-Fix-brown-paper-bag-bug-in-bbe08b8869bd29d587f24e.patch

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

29 July 2022, 20:00:04

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jul 29, 2022 at 3:10 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> However, the catalogs show the relfilenode being correct, and the
>> relfrozenxid set to a larger value. I suspect the problem here is that
>> this needs to be done in the other order, with the TRUNCATE first and
>> the update to the pg_class columns afterward.

> That fix appears to be correct. Patch attached.

Looks plausible.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

29 July 2022, 21:13:21

On Fri, Jul 29, 2022 at 4:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Looks plausible.

Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Andrew Dunstan

Date:

29 July 2022, 21:26:25

On 2022-07-29 Fr 14:35, Robert Haas wrote:
> On Fri, Jul 29, 2022 at 1:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> crake has been failing its cross-version-upgrade tests [1] since
>> this went in:
>>
>> log files for step xversion-upgrade-REL9_4_STABLE-HEAD:
>> ==~_~===-=-===~_~== /home/andrew/bf/root/upgrade.crake/HEAD/REL9_4_STABLE-amcheck-1.log ==~_~===-=-===~_~==
>> heap table "regression.pg_catalog.pg_largeobject", block 0, offset 7:
>>     xmin 7707 precedes relation freeze threshold 0:14779
>> heap table "regression.pg_catalog.pg_largeobject", block 201, offset 5:
>>     xmin 8633 precedes relation freeze threshold 0:14779
>>
>> I'm not very sure what to make of that, but it's failed identically
>> four times in four attempts.
> That's complaining about two tuples in the pg_largeobject table with
> xmin values that precedes relfrozenxid -- which suggests that even
> after 80d6907219, relfrozenxid isn't being correctly preserved in this
> test case, since the last run still failed the same way.
>
> But what exactly is this test case testing? I've previously complained
> about buildfarm outputs not being as labelled as well as they need to
> be in order to be easily understood by, well, me anyway. It'd sure
> help if the commands that led up to this problem were included in the
> output. I downloaded latest-client.tgz from the build farm server and
> am looking at TestUpgradeXversion.pm, but there's no mention of
> -amcheck-1.log in there, just -analyse.log, -copy.log, and following.
> So I suppose this is running some different code or special
> configuration...



Not really, but it is running git bleeding edge. This code comes from
<https://github.com/PGBuildFarm/client-code/commit/191df23bd25eb5546b0989d71ae92747151f9f39>
at lines 704-705


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

29 July 2022, 22:28:44

On Fri, Jul 29, 2022 at 5:13 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jul 29, 2022 at 4:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Looks plausible.
>
> Committed.

wrasse just failed the new test:

[00:09:44.167](0.001s) not ok 16 - old and new horizons match after pg_upgrade
[00:09:44.167](0.001s)
[00:09:44.167](0.000s) #   Failed test 'old and new horizons match
after pg_upgrade'
#   at t/002_pg_upgrade.pl line 345.
[00:09:44.168](0.000s) #          got: '1'
#     expected: '0'
=== diff of
/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_D3cJ/horizon1.txt
and /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_D3cJ/horizon2.txt
=== stdout ===
1c1
< pg_backend_pid|21767
---
> pg_backend_pid|22045=== stderr ===
=== EOF ===

I'm slightly befuddled as to how we're ending up with a table named
pg_backend_pid. That said, perhaps this is just a case of needing to
prevent autovacuum from running on the new cluster before we've had a
chance to record the horizons? But I'm not very confident in that
explanation at this point.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

29 July 2022, 23:16:34

Robert Haas <robertmhaas@gmail.com> writes:
> wrasse just failed the new test:

> [00:09:44.167](0.001s) not ok 16 - old and new horizons match after pg_upgrade
> [00:09:44.167](0.001s)
> [00:09:44.167](0.000s) #   Failed test 'old and new horizons match
> after pg_upgrade'
> #   at t/002_pg_upgrade.pl line 345.
> [00:09:44.168](0.000s) #          got: '1'
> #     expected: '0'
> === diff of
/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_D3cJ/horizon1.txt
> and /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_D3cJ/horizon2.txt
> === stdout ===
> 1c1
> < pg_backend_pid|21767
> ---
> > pg_backend_pid|22045=== stderr ===
> === EOF ===

> I'm slightly befuddled as to how we're ending up with a table named
> pg_backend_pid.

That's not the only thing weird about this printout: there should be
three columns not two in that query's output, and what happened to
the trailing newline?  I don't think we're looking at desired
output at all.

I am suspicious that the problem stems from the nonstandard
way you've invoked psql to collect the horizon data.  At the very
least this code is duplicating bits of Cluster::psql that it'd be
better not to; and I wonder if the issue is that it's not duplicating
enough.  The lack of -X and the lack of use of installed_command()
are red flags.  Do you really need to do it like this?

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

29 July 2022, 23:36:14

On Fri, Jul 29, 2022 at 7:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That's not the only thing weird about this printout: there should be
> three columns not two in that query's output, and what happened to
> the trailing newline?  I don't think we're looking at desired
> output at all.

Well, that's an awfully good point.

> I am suspicious that the problem stems from the nonstandard
> way you've invoked psql to collect the horizon data.  At the very
> least this code is duplicating bits of Cluster::psql that it'd be
> better not to; and I wonder if the issue is that it's not duplicating
> enough.  The lack of -X and the lack of use of installed_command()
> are red flags.  Do you really need to do it like this?

Well, I just copied the pg_dump block which occurs directly beforehand
and modified it. I think that must take care of setting the path
properly, else we'd have things blowing up all over the place. But the
lack of -X could be an issue.

The missing newline thing happens for me locally too, if I revert the
bug fix portion of that commit, but I do seem to get the right columns
in the output. It looks like this:

19:24:16.057](0.000s) not ok 16 - old and new horizons match after pg_upgrade
[19:24:16.058](0.000s)
[19:24:16.058](0.000s) #   Failed test 'old and new horizons match
after pg_upgrade'
#   at t/002_pg_upgrade.pl line 345.
[19:24:16.058](0.000s) #          got: '1'
#     expected: '0'
=== diff of /Users/rhaas/pgsql/src/bin/pg_upgrade/tmp_check/tmp_test_K8Fs/horizon1.txt
and /Users/rhaas/pgsql/src/bin/pg_upgrade/tmp_check/tmp_test_K8Fs/horizon2.txt
=== stdout ===
1c1
< pg_largeobject|718|1
---
> pg_largeobject|17518|3=== stderr ===
=== EOF ===
[19:24:16.066](0.008s) 1..16

I can't account for the absence of a newline there, because hexdump
says that both horizon1.txt and horizon2.txt end with one, and if I
run diff on those two files and pipe the output into hexdump, it sees
a newline at the end of that output too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

30 July 2022, 00:02:51

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jul 29, 2022 at 7:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I am suspicious that the problem stems from the nonstandard
>> way you've invoked psql to collect the horizon data.

> Well, I just copied the pg_dump block which occurs directly beforehand
> and modified it. I think that must take care of setting the path
> properly, else we'd have things blowing up all over the place. But the
> lack of -X could be an issue.

Hmm.  Now that I look, I do see two pre-existing "naked" invocations
of psql in 002_pg_upgrade.pl, ie

    $oldnode->command_ok([ 'psql', '-X', '-f', $olddumpfile, 'postgres' ],
        'loaded old dump file');

    $oldnode->command_ok(
        [
            'psql', '-X',
            '-f', "$srcdir/src/bin/pg_upgrade/upgrade_adapt.sql",
            'regression'
        ],
        'ran adapt script');

Those suggest that maybe all you need is -X.  However, I don't think
either of those calls is reached by the majority of buildfarm animals,
only ones that are doing cross-version-upgrade tests.  So there
could be more secret sauce needed to get this to pass everywhere.

Personally I'd try to replace the two horizon-collection steps with
$newnode->psql calls, using extra_params to inject the '-o' and target
filename command line words.  But if you want to try adding -X as
a quicker answer, maybe that will be enough.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

30 July 2022, 00:08:02

On Fri, Jul 29, 2022 at 8:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Personally I'd try to replace the two horizon-collection steps with
> $newnode->psql calls, using extra_params to inject the '-o' and target
> filename command line words.  But if you want to try adding -X as
> a quicker answer, maybe that will be enough.

Here's a patch that uses a variant of that approach: it just runs
safe_psql straight up and gets the output, then writes it out to temp
files if the output doesn't match and we need to run diff. Let me know
what you think of this.

While working on this, I noticed a few other problems. One is that the
query doesn't have an ORDER BY clause, which it really should, or the
output won't be stable. And the other is that I think we should be
testing against the regression database, not the postgres database,
because it's got a bunch of user tables in it, not just
pg_largeobject.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

better-psql-invocation.txt

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

30 July 2022, 00:22:35

Robert Haas <robertmhaas@gmail.com> writes:
> Here's a patch that uses a variant of that approach: it just runs
> safe_psql straight up and gets the output, then writes it out to temp
> files if the output doesn't match and we need to run diff. Let me know
> what you think of this.

That looks good to me, although obviously I don't know for sure
if it will make wrasse happy.

> While working on this, I noticed a few other problems. One is that the
> query doesn't have an ORDER BY clause, which it really should, or the
> output won't be stable. And the other is that I think we should be
> testing against the regression database, not the postgres database,
> because it's got a bunch of user tables in it, not just
> pg_largeobject.

Both of those sound like "d'oh" observations to me.  +1

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Noah Misch

Date:

30 July 2022, 05:44:01

On Fri, Jul 29, 2022 at 07:16:34PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > wrasse just failed the new test:
> 
> > [00:09:44.167](0.001s) not ok 16 - old and new horizons match after pg_upgrade
> > [00:09:44.167](0.001s)
> > [00:09:44.167](0.000s) #   Failed test 'old and new horizons match
> > after pg_upgrade'
> > #   at t/002_pg_upgrade.pl line 345.
> > [00:09:44.168](0.000s) #          got: '1'
> > #     expected: '0'
> > === diff of
/export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_D3cJ/horizon1.txt
> > and /export/home/nm/farm/studio64v12_6/HEAD/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_D3cJ/horizon2.txt
> > === stdout ===
> > 1c1
> > < pg_backend_pid|21767
> > ---
> > > pg_backend_pid|22045=== stderr ===
> > === EOF ===
> 
> > I'm slightly befuddled as to how we're ending up with a table named
> > pg_backend_pid.

> The lack of -X and the lack of use of installed_command()
> are red flags.

The pg_backend_pid is from "SELECT pg_catalog.pg_backend_pid();" in ~/.psqlrc,
so the lack of -X caused that.  The latest commit fixes things on a normal
GNU/Linux box, so I bet it will fix wrasse.  (thorntail managed not to fail
that way.  For unrelated reasons, I override thorntail's $HOME to a
mostly-empty directory.)

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

30 July 2022, 14:40:34

Noah Misch <noah@leadboat.com> writes:
> The pg_backend_pid is from "SELECT pg_catalog.pg_backend_pid();" in ~/.psqlrc,
> so the lack of -X caused that.  The latest commit fixes things on a normal
> GNU/Linux box, so I bet it will fix wrasse.

Yup, looks like we're all good now.  Thanks!

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

"Jonathan S. Katz"

Date:

01 August 2022, 21:17:53

On 7/30/22 10:40 AM, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
>> The pg_backend_pid is from "SELECT pg_catalog.pg_backend_pid();" in ~/.psqlrc,
>> so the lack of -X caused that.  The latest commit fixes things on a normal
>> GNU/Linux box, so I bet it will fix wrasse.
> 
> Yup, looks like we're all good now.  Thanks!

Given this appears to be resolved, I have removed this from "Open 
Items". Thanks!

Jonathan

Attachment

OpenPGP_signature

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

02 August 2022, 17:12:13

"Jonathan S. Katz" <jkatz@postgresql.org> writes:
> Given this appears to be resolved, I have removed this from "Open
> Items". Thanks!

Sadly, we're still not out of the woods.  I see three buildfarm
failures in this test since Robert resolved the "-X" problem [1][2][3]:

grassquit reported

[19:34:15.249](0.001s) not ok 14 - old and new horizons match after pg_upgrade
[19:34:15.249](0.001s)
[19:34:15.249](0.000s) #   Failed test 'old and new horizons match after pg_upgrade'
#   at t/002_pg_upgrade.pl line 336.
=== diff of
/mnt/resource/bf/build/grassquit/REL_15_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_z3zV/horizon1.txtand
/mnt/resource/bf/build/grassquit/REL_15_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_z3zV/horizon2.txt
=== stdout ===
785c785
< spgist_point_tbl|7213|1
---
> spgist_point_tbl|7356|3
787c787
< spgist_text_tbl|7327|1
---
> spgist_text_tbl|8311|3=== stderr ===
=== EOF ===

wrasse reported

[06:36:35.834](0.001s) not ok 14 - old and new horizons match after pg_upgrade
[06:36:35.835](0.001s)
[06:36:35.835](0.000s) #   Failed test 'old and new horizons match after pg_upgrade'
#   at t/002_pg_upgrade.pl line 336.
=== diff of
/export/home/nm/farm/studio64v12_6/REL_15_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_mLle/horizon1.txtand
/export/home/nm/farm/studio64v12_6/REL_15_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_mLle/horizon2.txt
=== stdout ===
138c138
< delete_test_table|7171|1
---
> delete_test_table|7171|3=== stderr ===
Warning: missing newline at end of file
/export/home/nm/farm/studio64v12_6/REL_15_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_mLle/horizon1.txt
Warning: missing newline at end of file
/export/home/nm/farm/studio64v12_6/REL_15_STABLE/pgsql.build/src/bin/pg_upgrade/tmp_check/tmp_test_mLle/horizon2.txt===
EOF=== 

conchuela doesn't seem to have preserved the detailed log, but it
failed at the same place:

#   Failed test 'old and new horizons match after pg_upgrade'
#   at t/002_pg_upgrade.pl line 336.

Not sure what to make of this, except that maybe the test is telling
us about an actual bug of exactly the kind it's designed to expose.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=grassquit&dt=2022-08-01%2019%3A25%3A43
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-08-02%2004%3A18%3A18
[3] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-08-02%2014%3A56%3A49

Re: pg15b2: large objects lost on upgrade

From

"Jonathan S. Katz"

Date:

02 August 2022, 19:19:39

On 8/2/22 1:12 PM, Tom Lane wrote:
> "Jonathan S. Katz" <jkatz@postgresql.org> writes:
>> Given this appears to be resolved, I have removed this from "Open
>> Items". Thanks!
> 
> Sadly, we're still not out of the woods.  I see three buildfarm
> failures in this test since Robert resolved the "-X" problem [1][2][3]:
> 
> Not sure what to make of this, except that maybe the test is telling
> us about an actual bug of exactly the kind it's designed to expose.

Looking at the test code, is there anything that could have changed the 
relfrozenxid or relminxid independently of the test on these systems?

That said, I do think we should reopen the item to figure out what's 
going on. Doing so now.

Jonathan

Attachment

OpenPGP_signature

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

02 August 2022, 19:23:26

On Tue, Aug 2, 2022 at 1:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Not sure what to make of this, except that maybe the test is telling
> us about an actual bug of exactly the kind it's designed to expose.

That could be, but what would the bug be exactly? It's hard to think
of a more direct way of setting relminmxid and relfrozenxid than
updating pg_class. It doesn't seem realistic to suppose that we have a
bug where setting a column in a system table to an integer value
sometimes sets it to a slightly larger integer instead. If the values
on the new cluster seemed like they had never been set, or if it
seemed like they had been set to completely random values, then I'd
suspect a bug in the mechanism, but here it seems more believable to
me to think that we're actually setting the correct values and then
something - maybe autovacuum - bumps them again before we have a
chance to look at them.

I'm not quite sure how to rule that theory in or out, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

"Jonathan S. Katz"

Date:

02 August 2022, 19:27:49

On 8/2/22 3:23 PM, Robert Haas wrote:
> On Tue, Aug 2, 2022 at 1:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Not sure what to make of this, except that maybe the test is telling
>> us about an actual bug of exactly the kind it's designed to expose.
> 
> That could be, but what would the bug be exactly? It's hard to think
> of a more direct way of setting relminmxid and relfrozenxid than
> updating pg_class. It doesn't seem realistic to suppose that we have a
> bug where setting a column in a system table to an integer value
> sometimes sets it to a slightly larger integer instead. If the values
> on the new cluster seemed like they had never been set, or if it
> seemed like they had been set to completely random values, then I'd
> suspect a bug in the mechanism, but here it seems more believable to
> me to think that we're actually setting the correct values and then
> something - maybe autovacuum - bumps them again before we have a
> chance to look at them.

FWIW (and I have not looked deeply at the code), I was thinking it could 
be something along those lines, given 1. the randomness of the 
underlying systems of the impacted farm animals and 2. it was only the 
three mentioned.

> I'm not quite sure how to rule that theory in or out, though.

Without overcomplicating this, are we able to check to see if autovacuum 
ran during the course of the test?

Jonathan

Attachment

OpenPGP_signature

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

02 August 2022, 19:32:05

"Jonathan S. Katz" <jkatz@postgresql.org> writes:
> On 8/2/22 1:12 PM, Tom Lane wrote:
>> Sadly, we're still not out of the woods.  I see three buildfarm
>> failures in this test since Robert resolved the "-X" problem [1][2][3]:

> Looking at the test code, is there anything that could have changed the 
> relfrozenxid or relminxid independently of the test on these systems?

Hmmm ... now that you mention it, I see nothing in 002_pg_upgrade.pl
that attempts to turn off autovacuum on either the source server or
the destination.  So one plausible theory is that autovac moved the
numbers since we checked.

If that is the explanation, then it leaves us with few good options.
I am not in favor of disabling autovacuum in the test: ordinary
users are not going to do that while pg_upgrade'ing, so it'd make
the test less representative of real-world usage, which seems like
a bad idea.  We could either drop this particular check again, or
weaken it to allow new relfrozenxid >= old relfrozenxid, likewise
relminxid.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

02 August 2022, 19:39:59

"Jonathan S. Katz" <jkatz@postgresql.org> writes:
> On 8/2/22 3:23 PM, Robert Haas wrote:
>> I'm not quite sure how to rule that theory in or out, though.

> Without overcomplicating this, are we able to check to see if autovacuum
> ran during the course of the test?

Looks like we're all thinking along the same lines.

grassquit shows this at the end of the old server's log,
immediately after the query to retrieve the old horizons:

2022-08-01 19:33:41.608 UTC [1487114][postmaster][:0] LOG:  received fast shutdown request
2022-08-01 19:33:41.611 UTC [1487114][postmaster][:0] LOG:  aborting any active transactions
2022-08-01 19:33:41.643 UTC [1487114][postmaster][:0] LOG:  background worker "logical replication launcher" (PID
1487132)exited with exit code 1 
2022-08-01 19:33:41.643 UTC [1493875][autovacuum worker][5/6398:0] FATAL:  terminating autovacuum process due to
administratorcommand 
2022-08-01 19:33:41.932 UTC [1487121][checkpointer][:0] LOG:  checkpoint complete: wrote 1568 buffers (9.6%); 0 WAL
file(s)added, 0 removed, 33 recycled; write=31.470 s, sync=0.156 s, total=31.711 s; sync files=893, longest=0.002 s,
average=0.001s; distance=33792 kB, estimate=34986 kB 
2022-08-01 19:33:41.933 UTC [1487121][checkpointer][:0] LOG:  shutting down

and wrasse shows this:

2022-08-02 06:35:01.974 CEST [5606:6] LOG:  received fast shutdown request
2022-08-02 06:35:01.974 CEST [5606:7] LOG:  aborting any active transactions
2022-08-02 06:35:01.975 CEST [6758:1] FATAL:  terminating autovacuum process due to administrator command
2022-08-02 06:35:01.975 CEST [6758:2] CONTEXT:  while vacuuming index "spgist_point_idx" of relation
"public.spgist_point_tbl"
2022-08-02 06:35:01.981 CEST [5606:8] LOG:  background worker "logical replication launcher" (PID 5612) exited with
exitcode 1 
2022-08-02 06:35:01.995 CEST [5607:42] LOG:  shutting down

While not smoking guns, these definitely prove that autovac was active.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

"Jonathan S. Katz"

Date:

02 August 2022, 19:44:40

On 8/2/22 3:39 PM, Tom Lane wrote:
> "Jonathan S. Katz" <jkatz@postgresql.org> writes:
>> On 8/2/22 3:23 PM, Robert Haas wrote:
>>> I'm not quite sure how to rule that theory in or out, though.
> 
>> Without overcomplicating this, are we able to check to see if autovacuum
>> ran during the course of the test?
> 
> Looks like we're all thinking along the same lines.
> 
> While not smoking guns, these definitely prove that autovac was active.

 > If that is the explanation, then it leaves us with few good options.
 > I am not in favor of disabling autovacuum in the test: ordinary
 > users are not going to do that while pg_upgrade'ing, so it'd make
 > the test less representative of real-world usage, which seems like
 > a bad idea.  We could either drop this particular check again, or
 > weaken it to allow new relfrozenxid >= old relfrozenxid, likewise
 > relminxid.

The test does look helpful and it would catch regressions. Loosely 
quoting Robert on a different point upthread, we don't want to turn off 
the alarm just because it's spuriously going off.

I think the weakened check is OK (and possibly mimics the real-world 
where autovacuum runs), unless you see a major drawback to it?

Jonathan

Attachment

OpenPGP_signature

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

02 August 2022, 19:51:21

"Jonathan S. Katz" <jkatz@postgresql.org> writes:
> On 8/2/22 3:39 PM, Tom Lane wrote:
>>> I am not in favor of disabling autovacuum in the test: ordinary
>>> users are not going to do that while pg_upgrade'ing, so it'd make
>>> the test less representative of real-world usage, which seems like
>>> a bad idea.  We could either drop this particular check again, or
>>> weaken it to allow new relfrozenxid >= old relfrozenxid, likewise
>>> relminxid.

> The test does look helpful and it would catch regressions. Loosely 
> quoting Robert on a different point upthread, we don't want to turn off 
> the alarm just because it's spuriously going off.
> I think the weakened check is OK (and possibly mimics the real-world 
> where autovacuum runs), unless you see a major drawback to it?

I also think that ">=" is a sufficient requirement.  It'd be a
bit painful to test if we had to cope with potential XID wraparound,
but we know that these installations haven't been around nearly
long enough for that, so a plain ">=" test ought to be good enough.
(Replacing the simple "eq" code with something that can handle that
doesn't look like much fun, though.)

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

"Jonathan S. Katz"

Date:

02 August 2022, 20:20:25

On 8/2/22 3:51 PM, Tom Lane wrote:
> "Jonathan S. Katz" <jkatz@postgresql.org> writes:
>> On 8/2/22 3:39 PM, Tom Lane wrote:
>>>> I am not in favor of disabling autovacuum in the test: ordinary
>>>> users are not going to do that while pg_upgrade'ing, so it'd make
>>>> the test less representative of real-world usage, which seems like
>>>> a bad idea.  We could either drop this particular check again, or
>>>> weaken it to allow new relfrozenxid >= old relfrozenxid, likewise
>>>> relminxid.
> 
>> The test does look helpful and it would catch regressions. Loosely
>> quoting Robert on a different point upthread, we don't want to turn off
>> the alarm just because it's spuriously going off.
>> I think the weakened check is OK (and possibly mimics the real-world
>> where autovacuum runs), unless you see a major drawback to it?
> 
> I also think that ">=" is a sufficient requirement.  It'd be a
> bit painful to test if we had to cope with potential XID wraparound,
> but we know that these installations haven't been around nearly
> long enough for that, so a plain ">=" test ought to be good enough.
> (Replacing the simple "eq" code with something that can handle that
> doesn't look like much fun, though.)

...if these systems are hitting XID wraparound, we have another issue to 
worry about.

I started modifying the test to support this behavior, but thought that 
because 1. we want to ensure the OID is still equal and 2. in the 
examples you showed, both relfrozenxid or relminxid could increment, we 
may want to have the individual checks on each column.

I may be able to conjure something up that does the above, but it's been 
a minute since I wrote anything in Perl.

Jonathan

Attachment

OpenPGP_signature

Re: pg15b2: large objects lost on upgrade

From

"Jonathan S. Katz"

Date:

02 August 2022, 20:44:52

On 8/2/22 4:20 PM, Jonathan S. Katz wrote:
> On 8/2/22 3:51 PM, Tom Lane wrote:
>> "Jonathan S. Katz" <jkatz@postgresql.org> writes:
>>> On 8/2/22 3:39 PM, Tom Lane wrote:
>>>>> I am not in favor of disabling autovacuum in the test: ordinary
>>>>> users are not going to do that while pg_upgrade'ing, so it'd make
>>>>> the test less representative of real-world usage, which seems like
>>>>> a bad idea.  We could either drop this particular check again, or
>>>>> weaken it to allow new relfrozenxid >= old relfrozenxid, likewise
>>>>> relminxid.
>>
>>> The test does look helpful and it would catch regressions. Loosely
>>> quoting Robert on a different point upthread, we don't want to turn off
>>> the alarm just because it's spuriously going off.
>>> I think the weakened check is OK (and possibly mimics the real-world
>>> where autovacuum runs), unless you see a major drawback to it?
>>
>> I also think that ">=" is a sufficient requirement.  It'd be a
>> bit painful to test if we had to cope with potential XID wraparound,
>> but we know that these installations haven't been around nearly
>> long enough for that, so a plain ">=" test ought to be good enough.
>> (Replacing the simple "eq" code with something that can handle that
>> doesn't look like much fun, though.)
> 
> ...if these systems are hitting XID wraparound, we have another issue to 
> worry about.
> 
> I started modifying the test to support this behavior, but thought that 
> because 1. we want to ensure the OID is still equal and 2. in the 
> examples you showed, both relfrozenxid or relminxid could increment, we 
> may want to have the individual checks on each column.
> 
> I may be able to conjure something up that does the above, but it's been 
> a minute since I wrote anything in Perl.

Please see attached patch that does the above. Tests pass on my local 
environment (though I did not trigger autovacuum).

Jonathan

Attachment

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

03 August 2022, 13:59:40

On Tue, Aug 2, 2022 at 3:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > The test does look helpful and it would catch regressions. Loosely
> > quoting Robert on a different point upthread, we don't want to turn off
> > the alarm just because it's spuriously going off.
> > I think the weakened check is OK (and possibly mimics the real-world
> > where autovacuum runs), unless you see a major drawback to it?
>
> I also think that ">=" is a sufficient requirement.  It'd be a
> bit painful to test if we had to cope with potential XID wraparound,
> but we know that these installations haven't been around nearly
> long enough for that, so a plain ">=" test ought to be good enough.
> (Replacing the simple "eq" code with something that can handle that
> doesn't look like much fun, though.)

I don't really like this approach. Imagine that the code got broken in
such a way that relfrozenxid and relminmxid were set to a value chosen
at random - say, the contents of 4 bytes of unallocated memory that
contained random garbage. Well, right now, the chances that this would
cause a test failure are nearly 100%. With this change, they'd be
nearly 0%.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

03 August 2022, 14:13:51

Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Aug 2, 2022 at 3:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I also think that ">=" is a sufficient requirement.

> I don't really like this approach. Imagine that the code got broken in
> such a way that relfrozenxid and relminmxid were set to a value chosen
> at random - say, the contents of 4 bytes of unallocated memory that
> contained random garbage. Well, right now, the chances that this would
> cause a test failure are nearly 100%. With this change, they'd be
> nearly 0%.

If you have a different solution that you can implement by, say,
tomorrow, then go for it.  But I want to see some fix in there
within about 24 hours, because 15beta3 wraps on Monday and we
will need at least a few days to see if the buildfarm is actually
stable with whatever solution is applied.

A possible compromise is to allow new values that are between
old value and old-value-plus-a-few-dozen.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

"Jonathan S. Katz"

Date:

03 August 2022, 14:53:06

> On Aug 3, 2022, at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Tue, Aug 2, 2022 at 3:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I also think that ">=" is a sufficient requirement.
>
>> I don't really like this approach. Imagine that the code got broken in
>> such a way that relfrozenxid and relminmxid were set to a value chosen
>> at random - say, the contents of 4 bytes of unallocated memory that
>> contained random garbage. Well, right now, the chances that this would
>> cause a test failure are nearly 100%. With this change, they'd be
>> nearly 0%.
>
> If you have a different solution that you can implement by, say,
> tomorrow, then go for it.  But I want to see some fix in there
> within about 24 hours, because 15beta3 wraps on Monday and we
> will need at least a few days to see if the buildfarm is actually
> stable with whatever solution is applied.

Yeah, I would argue that the current proposal
guards against the false positives as they currently stand.

I do think Robert raises a fair point, but I wonder
if another test would catch that? I don’t want to
say “this would never happen” because, well,
it could happen. But AIUI this would probably
manifest itself in other places too?

> A possible compromise is to allow new values that are between
> old value and old-value-plus-a-few-dozen.

Well, that’s kind of deterministic :-) I’m OK
with that tweak, where “OK” means not thrilled,
but I don’t see a better way to get more granular
details (at least through my phone searches).

I can probably have a tweak for this in a couple
of hours if and when I’m on plane wifi.

Jonathan

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

03 August 2022, 16:13:59

On Wed, Aug 3, 2022 at 10:13 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If you have a different solution that you can implement by, say,
> tomorrow, then go for it.  But I want to see some fix in there
> within about 24 hours, because 15beta3 wraps on Monday and we
> will need at least a few days to see if the buildfarm is actually
> stable with whatever solution is applied.

I doubt that I can come up with something that quickly, so I guess we
need some stopgap for now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Peter Geoghegan

Date:

03 August 2022, 18:52:18

On Tue, Aug 2, 2022 at 12:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmmm ... now that you mention it, I see nothing in 002_pg_upgrade.pl
> that attempts to turn off autovacuum on either the source server or
> the destination.  So one plausible theory is that autovac moved the
> numbers since we checked.

It's very easy to believe that my work in commit 0b018fab could make
that happen, which is only a few months old. It's now completely
routine for non-aggressive autovacuums to advance relfrozenxid by at
least a small amount.

For example, when autovacuum runs against either the tellers table or
the branches table during a pgbench run, it will now advance
relfrozenxid, every single time. And to a very recent value. This will
happen in spite of the fact that no freezing ever takes place -- it's
just a consequence of the oldest extant XID consistently being quite
young each time, due to workload characteristics.

-- 
Peter Geoghegan

Re: pg15b2: large objects lost on upgrade

From

Peter Geoghegan

Date:

03 August 2022, 19:41:45

On Wed, Aug 3, 2022 at 6:59 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I don't really like this approach. Imagine that the code got broken in
> such a way that relfrozenxid and relminmxid were set to a value chosen
> at random - say, the contents of 4 bytes of unallocated memory that
> contained random garbage. Well, right now, the chances that this would
> cause a test failure are nearly 100%. With this change, they'd be
> nearly 0%.

If that kind of speculative bug existed, and somehow triggered before
the concurrent autovacuum ran (which seems very likely to be the
source of the test flappiness), then it would still be caught, most
likely. VACUUM itself has the following defenses:

* The defensive "can't happen" errors added to
heap_prepare_freeze_tuple() and related freezing routines by commit
699bf7d0 in 2017, as hardening following the "freeze the dead" bug.
That'll catch XIDs that are before the relfrozenxid at the start of
the VACUUM (ditto for MXIDs/relminmxid).

* The assertion added in my recent commit 0b018fab, which verifies
that we're about to set relfrozenxid to something sane.

* VACUUM now warns when it sees a *previous* relfrozenxid that's
apparently "in the future", following recent commit e83ebfe6. This
problem scenario is associated with several historic bugs in
pg_upgrade, where for one reason or another it failed to carry forward
correct relfrozenxid and/or relminmxid values for a table (see the
commit message for references to those old pg_upgrade bugs).

It might make sense to run a manual VACUUM right at the end of the
test, so that you reliably get this kind of coverage, even without
autovacuum.

-- 
Peter Geoghegan

Re: pg15b2: large objects lost on upgrade

From

Andres Freund

Date:

03 August 2022, 20:20:14

Hi,

On 2022-08-03 09:59:40 -0400, Robert Haas wrote:
> On Tue, Aug 2, 2022 at 3:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > The test does look helpful and it would catch regressions. Loosely
> > > quoting Robert on a different point upthread, we don't want to turn off
> > > the alarm just because it's spuriously going off.
> > > I think the weakened check is OK (and possibly mimics the real-world
> > > where autovacuum runs), unless you see a major drawback to it?
> >
> > I also think that ">=" is a sufficient requirement.  It'd be a
> > bit painful to test if we had to cope with potential XID wraparound,
> > but we know that these installations haven't been around nearly
> > long enough for that, so a plain ">=" test ought to be good enough.
> > (Replacing the simple "eq" code with something that can handle that
> > doesn't look like much fun, though.)
> 
> I don't really like this approach. Imagine that the code got broken in
> such a way that relfrozenxid and relminmxid were set to a value chosen
> at random - say, the contents of 4 bytes of unallocated memory that
> contained random garbage. Well, right now, the chances that this would
> cause a test failure are nearly 100%. With this change, they'd be
> nearly 0%.

Can't that pretty easily be addressed by subsequently querying txid_current(),
and checking that the value isn't newer than that?

Greetings,

Andres Freund

Re: pg15b2: large objects lost on upgrade

From

Peter Geoghegan

Date:

03 August 2022, 20:29:25

On Wed, Aug 3, 2022 at 1:20 PM Andres Freund <andres@anarazel.de> wrote:
> > I don't really like this approach. Imagine that the code got broken in
> > such a way that relfrozenxid and relminmxid were set to a value chosen
> > at random - say, the contents of 4 bytes of unallocated memory that
> > contained random garbage. Well, right now, the chances that this would
> > cause a test failure are nearly 100%. With this change, they'd be
> > nearly 0%.
>
> Can't that pretty easily be addressed by subsequently querying txid_current(),
> and checking that the value isn't newer than that?

It couldn't hurt to do that as well, in passing (at the same time as
testing that newrelfrozenxid >= oldrelfrozenxid directly). But
deliberately running VACUUM afterwards seems like a good idea. We
really ought to expect VACUUM to catch cases where
relfrozenxid/relminmxid is faulty, at least in cases where it can be
proven wrong by noticing some kind of inconsistency.

-- 
Peter Geoghegan

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

03 August 2022, 20:34:16

Peter Geoghegan <pg@bowt.ie> writes:
> It couldn't hurt to do that as well, in passing (at the same time as
> testing that newrelfrozenxid >= oldrelfrozenxid directly). But
> deliberately running VACUUM afterwards seems like a good idea. We
> really ought to expect VACUUM to catch cases where
> relfrozenxid/relminmxid is faulty, at least in cases where it can be
> proven wrong by noticing some kind of inconsistency.

That doesn't seem like it'd be all that thorough: we expect VACUUM
to skip pages whenever possible.  I'm also a bit concerned about
the expense, though admittedly this test is ridiculously expensive
already.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

03 August 2022, 20:41:27

On Wed, Aug 3, 2022 at 4:20 PM Andres Freund <andres@anarazel.de> wrote:
> > I don't really like this approach. Imagine that the code got broken in
> > such a way that relfrozenxid and relminmxid were set to a value chosen
> > at random - say, the contents of 4 bytes of unallocated memory that
> > contained random garbage. Well, right now, the chances that this would
> > cause a test failure are nearly 100%. With this change, they'd be
> > nearly 0%.
>
> Can't that pretty easily be addressed by subsequently querying txid_current(),
> and checking that the value isn't newer than that?

Hmm, maybe. The old cluster shouldn't have wrapped around ever, since
we just created it. So the value in the new cluster should be >= that
value and <= the result of txid_curent() ignoring wraparound.

Or we could disable autovacuum on the new cluster, which I think is a
better solution. I like it when things match exactly; it makes me feel
that the universe is well-ordered.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Peter Geoghegan

Date:

03 August 2022, 20:42:22

On Wed, Aug 3, 2022 at 1:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> That doesn't seem like it'd be all that thorough: we expect VACUUM
> to skip pages whenever possible.  I'm also a bit concerned about
> the expense, though admittedly this test is ridiculously expensive
> already.

I bet the SKIP_PAGES_THRESHOLD stuff will be enough to make VACUUM
visit every heap page in practice for a test case like this. That is
all it takes to be able to safely advance relfrozenxid to whatever the
oldest extant XID happened to be. However, I'm no fan of the
SKIP_PAGES_THRESHOLD behavior, and already have plans to get rid of it
-- so I wouldn't rely on that continuing to be true forever.

It's probably not really necessary to have that kind of coverage in
this particular test case. VACUUM will complain about weird
relfrozenxid values in a large variety of contexts, even without
assertions enabled. Mostly I was just saying: if we really do need
test coverage of relfrozenxid in this context, then VACUUM is probably
the way to go.

-- 
Peter Geoghegan

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

03 August 2022, 20:46:57

Robert Haas <robertmhaas@gmail.com> writes:
> Or we could disable autovacuum on the new cluster, which I think is a
> better solution. I like it when things match exactly; it makes me feel
> that the universe is well-ordered.

Again, this seems to me to be breaking the test's real-world applicability
for a (false?) sense of stability.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Andres Freund

Date:

03 August 2022, 21:01:29

On 2022-08-03 16:46:57 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Or we could disable autovacuum on the new cluster, which I think is a
> > better solution. I like it when things match exactly; it makes me feel
> > that the universe is well-ordered.
> 
> Again, this seems to me to be breaking the test's real-world applicability
> for a (false?) sense of stability.

Yea, that doesn't seem like an improvement. I e.g. found the issues around
relfilenode reuse in 15 due to autovacuum running in the pg_upgrade target
cluster.  And I recall other bugs in the area...

Greetings,

Andres Freund

Re: pg15b2: large objects lost on upgrade

From

Peter Geoghegan

Date:

03 August 2022, 21:08:42

On Wed, Aug 3, 2022 at 1:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Again, this seems to me to be breaking the test's real-world applicability
> for a (false?) sense of stability.

I agree.

A lot of the VACUUM test flappiness issues we've had to deal with in
the past now seem like problems with VACUUM itself, the test's design,
or both. For example, why should we get a totally different
pg_class.reltuples because we couldn't get a cleanup lock on some
page? Why not just make sure to give the same answer either way,
which happens to be the most useful behavior to the user? That way
the test isn't just targeting implementation details.

--
Peter Geoghegan

Re: pg15b2: large objects lost on upgrade

From

"Jonathan S. Katz"

Date:

03 August 2022, 22:55:05

On 8/3/22 2:08 PM, Peter Geoghegan wrote:
> On Wed, Aug 3, 2022 at 1:47 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Again, this seems to me to be breaking the test's real-world applicability
>> for a (false?) sense of stability.
> 
> I agree.
> 
> A lot of the VACUUM test flappiness issues we've had to deal with in
> the past now seem like problems with VACUUM itself, the test's design,
> or both. For example, why should we get a totally different
> pg_class.reltuples because we couldn't get a cleanup lock on some
> page? Why not just make sure to give the same answer either way,
> which happens to be the most useful behavior to the user? That way
> the test isn't just targeting implementation details.

After catching up (and reviewing approaches that could work while on 
poor wifi), it does make me wonder if we can have a useful test ready 
before beta 3.

I did rule out wanting to do the "xid + $X" check after reviewing some 
of the output. I think that both $X could end up varying, and it really 
feels like a bandaid.

Andres suggested upthread using "txid_current()" -- for the comparison, 
that's one thing I looked at. Would any of the XID info from 
"pg_control_checkpoint()" also serve for this test?

If yes to the above, I should be able to modify this fairly quickly.

Jonathan

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

03 August 2022, 23:19:41

"Jonathan S. Katz" <jkatz@postgresql.org> writes:
> I did rule out wanting to do the "xid + $X" check after reviewing some 
> of the output. I think that both $X could end up varying, and it really 
> feels like a bandaid.

It is that.  I wouldn't feel comfortable with $X less than 100 or so,
which is probably sloppy enough to draw Robert's ire.  Still, realizing
that what we want right now is a band-aid for 15beta3, I don't think
it's an unreasonable short-term option.

> Andres suggested upthread using "txid_current()" -- for the comparison, 
> that's one thing I looked at. Would any of the XID info from 
> "pg_control_checkpoint()" also serve for this test?

I like the idea of txid_current(), but we have no comparable
function for mxid do we?  While you could get both numbers from
pg_control_checkpoint(), I doubt that's sufficiently up-to-date.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

"Jonathan S. Katz"

Date:

04 August 2022, 14:02:28

On 8/3/22 4:19 PM, Tom Lane wrote:
> "Jonathan S. Katz" <jkatz@postgresql.org> writes:
>> I did rule out wanting to do the "xid + $X" check after reviewing some
>> of the output. I think that both $X could end up varying, and it really
>> feels like a bandaid.
> 
> It is that.  I wouldn't feel comfortable with $X less than 100 or so,
> which is probably sloppy enough to draw Robert's ire.  Still, realizing
> that what we want right now is a band-aid for 15beta3, I don't think
> it's an unreasonable short-term option.

Attached is the "band-aid / sloppy" version of the patch. Given from the 
test examples I kept seeing deltas over 100 for relfrozenxid, I chose 
1000. The mxid delta was less, but I kept it at 1000 for consistency 
(and because I hope this test is short lived in this state), but can be 
talked into otherwise.

>> Andres suggested upthread using "txid_current()" -- for the comparison,
>> that's one thing I looked at. Would any of the XID info from
>> "pg_control_checkpoint()" also serve for this test?
> 
> I like the idea of txid_current(), but we have no comparable
> function for mxid do we?  While you could get both numbers from
> pg_control_checkpoint(), I doubt that's sufficiently up-to-date.

...unless we force a checkpoint in the test?

Jonathan

Attachment

pg_upgrade-test-gte-xid-v2.patch

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

04 August 2022, 14:08:02

On Wed, Aug 3, 2022 at 7:19 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Jonathan S. Katz" <jkatz@postgresql.org> writes:
> > I did rule out wanting to do the "xid + $X" check after reviewing some
> > of the output. I think that both $X could end up varying, and it really
> > feels like a bandaid.
>
> It is that.  I wouldn't feel comfortable with $X less than 100 or so,
> which is probably sloppy enough to draw Robert's ire.  Still, realizing
> that what we want right now is a band-aid for 15beta3, I don't think
> it's an unreasonable short-term option.

100 << 2^32, so it's not terrible, but I'm honestly coming around to
the view that we ought to just nuke this test case.

From my point of view, the assertion that disabling autovacuum during
this test case would make the test case useless seems to be incorrect.
The original purpose of the test was to make sure that the pre-upgrade
schema matched the post-upgrade schema. If having autovacuum running
or not running affects that, we have a serious problem, but this test
case isn't especially likely to find it, because whether autovacuum
runs or not during the brief window where the test is running is
totally unpredictable. Furthermore, if we do have such a problem, it
would probably indicate that vacuum is using the wrong horizons to
prune or test the visibility of the tuples. To find that out, we might
want to compare values upon which the behavior of vacuum might depend,
like relfrozenxid. But to do that, we have to disable autovacuum, so
that the value can't change under us. From my point of view, that's
making test coverage better, not worse, because any bugs in this area
that can be found without explicit testing of relevant horizons are
dependent on low-probability race conditions materializing in the
buildfarm. If we disable autovacuum and then compare relfrozenxid and
whatever else we care about explicitly, we can find bugs in that
category reliably.

However, if people don't accept that argument, then this new test case
is kind of silly. It's not the worst idea in the world to use a
threshold of 100 XIDs or something, but without disabling autovacuum,
we're basically comparing two things that can't be expected to be
equal, so we test and see if they're approximately equal and then call
that good enough. I don't know that I believe we'll ever find a bug
that way, though.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

enable-autovacuum-later.patch

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

04 August 2022, 14:09:05

On Thu, Aug 4, 2022 at 10:02 AM Jonathan S. Katz <jkatz@postgresql.org> wrote:
> Attached is the "band-aid / sloppy" version of the patch. Given from the
> test examples I kept seeing deltas over 100 for relfrozenxid, I chose
> 1000. The mxid delta was less, but I kept it at 1000 for consistency
> (and because I hope this test is short lived in this state), but can be
> talked into otherwise.

ISTM that you'd need to loop over the rows and do this for each row.
Otherwise I think you're just comparing results for the first relation
and ignoring all the rest.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

04 August 2022, 14:16:17

"Jonathan S. Katz" <jkatz@postgresql.org> writes:
> On 8/3/22 4:19 PM, Tom Lane wrote:
>> I like the idea of txid_current(), but we have no comparable
>> function for mxid do we?  While you could get both numbers from
>> pg_control_checkpoint(), I doubt that's sufficiently up-to-date.

> ...unless we force a checkpoint in the test?

Hmm ... maybe if you take a snapshot and hold that open while
forcing the checkpoint and doing the subsequent checks.  That
seems messy though.  Also, while that should serve to hold back
global xmin, I'm not at all clear on whether that has a similar
effect on minmxid.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

04 August 2022, 14:26:16

Robert Haas <robertmhaas@gmail.com> writes:
> 100 << 2^32, so it's not terrible, but I'm honestly coming around to
> the view that we ought to just nuke this test case.

I'd hesitated to suggest that, but I think that's a fine solution.
Especially since we can always put it back in later if we think
of a more robust way.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

04 August 2022, 16:43:49

On Thu, Aug 4, 2022 at 10:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > 100 << 2^32, so it's not terrible, but I'm honestly coming around to
> > the view that we ought to just nuke this test case.
>
> I'd hesitated to suggest that, but I think that's a fine solution.
> Especially since we can always put it back in later if we think
> of a more robust way.

IMHO it's 100% clear how to make it robust. If you want to check that
two values are the same, you can't let one of them be overwritten by
an unrelated event in the middle of the check. There are many specific
things we could do here, a few of which I proposed in my previous
email, but they all boil down to "don't let autovacuum screw up the
results".

But if you don't want to do that, and you also don't want to have
random failures, the only alternatives are weakening the check and
removing the test. It's kind of hard to say which is better, but I'm
inclined to think that if we just weaken the test we're going to think
we've got coverage for this kind of problem when we really don't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Andres Freund

Date:

04 August 2022, 16:59:08

Hi,

On 2022-08-04 12:43:49 -0400, Robert Haas wrote:
> On Thu, Aug 4, 2022 at 10:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > 100 << 2^32, so it's not terrible, but I'm honestly coming around to
> > > the view that we ought to just nuke this test case.
> >
> > I'd hesitated to suggest that, but I think that's a fine solution.
> > Especially since we can always put it back in later if we think
> > of a more robust way.
> 
> IMHO it's 100% clear how to make it robust. If you want to check that
> two values are the same, you can't let one of them be overwritten by
> an unrelated event in the middle of the check. There are many specific
> things we could do here, a few of which I proposed in my previous
> email, but they all boil down to "don't let autovacuum screw up the
> results".
>
> But if you don't want to do that, and you also don't want to have
> random failures, the only alternatives are weakening the check and
> removing the test. It's kind of hard to say which is better, but I'm
> inclined to think that if we just weaken the test we're going to think
> we've got coverage for this kind of problem when we really don't.

Why you think it's better to not have the test than to have a very limited
amount of fuzziness (by using the next xid as an upper limit). What's the bug
that will reliably pass the nextxid fuzzy comparison, but not an exact
comparison?

Greetings,

Andres Freund

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

04 August 2022, 17:49:06

Robert Haas <robertmhaas@gmail.com> writes:
> IMHO it's 100% clear how to make it robust. If you want to check that
> two values are the same, you can't let one of them be overwritten by
> an unrelated event in the middle of the check. There are many specific
> things we could do here, a few of which I proposed in my previous
> email, but they all boil down to "don't let autovacuum screw up the
> results".

It doesn't really matter how robust a test case is, if it isn't testing
the thing you need to have tested.  So I remain unwilling to disable
autovac in a way that won't match real-world usage.

Note that the patch you proposed at [1] will not fix anything.
It turns off autovac in the new node, but the buildfarm failures
we've seen appear to be due to autovac running on the old node.
(I believe that autovac in the new node is *also* a hazard, but
it seems to be a lot less of one, presumably because of timing
considerations.)  To make it work, we'd have to shut off autovac
in the old node before starting pg_upgrade, and that would make it
unacceptably (IMHO) different from what real users will do.

Conceivably, we could move all of this processing into pg_upgrade
itself --- autovac disable/re-enable and capturing of the horizon
data --- and that would address my complaint.  I don't really want
to go there though, especially when in the final analysis IT IS
NOT A BUG if a rel's horizons advance a bit during pg_upgrade.
It's only a bug if they become inconsistent with the rel's data,
which is not what this test is testing for.

            regards, tom lane

[1] https://www.postgresql.org/message-id/CA%2BTgmoZkBcMi%2BNikxfc54dgkWj41Q%3DZ4nuyHpheTcxA-qfS5Qg%40mail.gmail.com

Re: pg15b2: large objects lost on upgrade

From

Peter Geoghegan

Date:

04 August 2022, 18:04:14

On Thu, Aug 4, 2022 at 9:44 AM Robert Haas <robertmhaas@gmail.com> wrote:
> But if you don't want to do that, and you also don't want to have
> random failures, the only alternatives are weakening the check and
> removing the test. It's kind of hard to say which is better, but I'm
> inclined to think that if we just weaken the test we're going to think
> we've got coverage for this kind of problem when we really don't.

Perhaps amcheck's verify_heapam() function can be used here. What
could be better than exhaustively verifying that the relfrozenxid (and
relminmxid) invariants hold for every single tuple in the table? Those
are the exact conditions that we care about, as far as
relfrozenxid/relminmxid goes.

My sense is that that has a much better chance of detecting a real bug
at some point. This approach is arguably an example of property-based
testing.

-- 
Peter Geoghegan

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

04 August 2022, 18:07:45

Peter Geoghegan <pg@bowt.ie> writes:
> Perhaps amcheck's verify_heapam() function can be used here. What
> could be better than exhaustively verifying that the relfrozenxid (and
> relminmxid) invariants hold for every single tuple in the table?

How much will that add to the test's runtime?  I could get behind this
idea if it's not exorbitantly expensive.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Peter Geoghegan

Date:

04 August 2022, 18:08:46

On Thu, Aug 4, 2022 at 11:07 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> How much will that add to the test's runtime?  I could get behind this
> idea if it's not exorbitantly expensive.

I'm not sure offhand, but I suspect it wouldn't be too bad.

-- 
Peter Geoghegan

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

04 August 2022, 19:02:10

On Thu, Aug 4, 2022 at 1:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Note that the patch you proposed at [1] will not fix anything.
> It turns off autovac in the new node, but the buildfarm failures
> we've seen appear to be due to autovac running on the old node.
> (I believe that autovac in the new node is *also* a hazard, but
> it seems to be a lot less of one, presumably because of timing
> considerations.)  To make it work, we'd have to shut off autovac
> in the old node before starting pg_upgrade,

Yeah, that's a fair point.

> and that would make it
> unacceptably (IMHO) different from what real users will do.

I don't agree with that, but as you say, it is a matter of opinion. In
any case, what exactly do you want to do now?

Jonathon Katz has proposed a patch to do the fuzzy comparison which I
believe to be incorrect because I think it compares, at most, the
horizons for one table in the database.

I could go work on a better version of that, or he could, or you
could, but it seems like we're running out of time awfully quick here,
given that you wanted to have this resolved today and it's almost the
end of today.

I think the most practical alternative is to put this file back to the
way it was before I started tinkering with it, and revisit this issue
after the release. If you want to do something else, that's fine, but
I'm not going to be available to work on this issue over the weekend,
so if you want to do something else, you or someone else is going to
have to take responsibility for whatever further stabilization that
other approach may require between now and the release.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

04 August 2022, 19:10:52

Robert Haas <robertmhaas@gmail.com> writes:
> I think the most practical alternative is to put this file back to the
> way it was before I started tinkering with it, and revisit this issue
> after the release.

Yeah, that seems like the right thing.  We are running too low on time
to have any confidence that a modified version of the test will be
reliable.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

04 August 2022, 19:15:20

On Thu, Aug 4, 2022 at 12:59 PM Andres Freund <andres@anarazel.de> wrote:
> Why you think it's better to not have the test than to have a very limited
> amount of fuzziness (by using the next xid as an upper limit). What's the bug
> that will reliably pass the nextxid fuzzy comparison, but not an exact
> comparison?

I don't know. I mean, I guess one possibility is that the nextXid
value could be wrong too, because I doubt we have any separate test
that would catch that. But more generally, I don't have a lot of
confidence in fuzzy tests. It's too easy for things to look like
they're working when they really aren't.

Let's say that the value in the old cluster was 100 and the nextXid in
the new cluster is 1000. Well, it's not like every value between 100
and 1000 is OK. The overwhelming majority of those values could never
be produced, neither from the old cluster nor from any subsequent
vacuum. Given that the old cluster is suffering no new write
transactions, there's probably exactly two values that are legal: one
being the value from the old cluster, which we know, and the other
being whatever a vacuum of that table would produce, which we don't
know, although we do know that it's somewhere in that range. Let's
flip the question on its head: why should some hypothetical future bug
that we have in this area produce a value outside that range?

If it's a failure to set the value at all, or if it generates a value
at random, we'd likely still catch it. And those are pretty likely, so
maybe the value of such a test is not zero. On the other hand, subtle
breakage might be more likely to survive developer testing than
complete breakage.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Peter Geoghegan

Date:

04 August 2022, 19:22:32

On Thu, Aug 4, 2022 at 12:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Given that the old cluster is suffering no new write
> transactions, there's probably exactly two values that are legal: one
> being the value from the old cluster, which we know, and the other
> being whatever a vacuum of that table would produce, which we don't
> know, although we do know that it's somewhere in that range.

What about autoanalyze?

-- 
Peter Geoghegan

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

04 August 2022, 19:31:39

On Thu, Aug 4, 2022 at 3:23 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Aug 4, 2022 at 12:15 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > Given that the old cluster is suffering no new write
> > transactions, there's probably exactly two values that are legal: one
> > being the value from the old cluster, which we know, and the other
> > being whatever a vacuum of that table would produce, which we don't
> > know, although we do know that it's somewhere in that range.
>
> What about autoanalyze?

What about it?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Robert Haas

Date:

04 August 2022, 19:32:00

On Thu, Aug 4, 2022 at 3:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I think the most practical alternative is to put this file back to the
> > way it was before I started tinkering with it, and revisit this issue
> > after the release.
>
> Yeah, that seems like the right thing.  We are running too low on time
> to have any confidence that a modified version of the test will be
> reliable.

Done.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg15b2: large objects lost on upgrade

From

Peter Geoghegan

Date:

04 August 2022, 19:52:34

On Thu, Aug 4, 2022 at 12:31 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > What about autoanalyze?
>
> What about it?

It has a tendency to consume an XID, here or there, quite
unpredictably. I've noticed that this often involves an analyze of
pg_statistic. Have you accounted for that?

You said upthread that you don't like "fuzzy" tests, because it's too
easy for things to look like they're working when they really aren't.
I suppose that there may be some truth to that, but ISTM that there is
also a lot to be said for a test that can catch failures that weren't
specifically anticipated. Users won't be running pg_upgrade with
autovacuum disabled. And so ISTM that just testing that relfrozenxid
has been carried forward is more precise about one particular detail
(more precise than alternative approaches to testing), but less
precise about the thing that we actually care about.

-- 
Peter Geoghegan

Re: pg15b2: large objects lost on upgrade

From

Bruce Momjian

Date:

09 August 2022, 00:59:29

On Tue, Aug  2, 2022 at 03:32:05PM -0400, Tom Lane wrote:
> "Jonathan S. Katz" <jkatz@postgresql.org> writes:
> > On 8/2/22 1:12 PM, Tom Lane wrote:
> >> Sadly, we're still not out of the woods.  I see three buildfarm
> >> failures in this test since Robert resolved the "-X" problem [1][2][3]:
> 
> > Looking at the test code, is there anything that could have changed the 
> > relfrozenxid or relminxid independently of the test on these systems?
> 
> Hmmm ... now that you mention it, I see nothing in 002_pg_upgrade.pl
> that attempts to turn off autovacuum on either the source server or
> the destination.  So one plausible theory is that autovac moved the
> numbers since we checked.

Uh, pg_upgrade assumes autovacuum is not running, and tries to enforce
this:

    start_postmaster()
    ...
    /*
     * Use -b to disable autovacuum.
     *
     * Turn off durability requirements to improve object creation speed, and
     * we only modify the new cluster, so only use it there.  If there is a
     * crash, the new cluster has to be recreated anyway.  fsync=off is a big
     * win on ext4.
     *
     * Force vacuum_defer_cleanup_age to 0 on the new cluster, so that
     * vacuumdb --freeze actually freezes the tuples.
     */
    snprintf(cmd, sizeof(cmd),
             "\"%s/pg_ctl\" -w -l \"%s/%s\" -D \"%s\" -o \"-p %d -b%s %s%s\" start",
             cluster->bindir,
             log_opts.logdir,
             SERVER_LOG_FILE, cluster->pgconfig, cluster->port,
             (cluster == &new_cluster) ?
             " -c synchronous_commit=off -c fsync=off -c full_page_writes=off -c vacuum_defer_cleanup_age=0" : "",
             cluster->pgopts ? cluster->pgopts : "", socket_string);

Perhaps the test script should do something similar, or this method
doesn't work anymore.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

09 August 2022, 01:51:46

Bruce Momjian <bruce@momjian.us> writes:
>> Hmmm ... now that you mention it, I see nothing in 002_pg_upgrade.pl
>> that attempts to turn off autovacuum on either the source server or
>> the destination.  So one plausible theory is that autovac moved the
>> numbers since we checked.

> Uh, pg_upgrade assumes autovacuum is not running, and tries to enforce
> this:

The problems come from autovac running before or after pg_upgrade.

> Perhaps the test script should do something similar,

I'm not on board with that, for the reasons I gave upthread.

            regards, tom lane

Re: pg15b2: large objects lost on upgrade

From

Bruce Momjian

Date:

09 August 2022, 02:53:04

On Mon, Aug  8, 2022 at 09:51:46PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> >> Hmmm ... now that you mention it, I see nothing in 002_pg_upgrade.pl
> >> that attempts to turn off autovacuum on either the source server or
> >> the destination.  So one plausible theory is that autovac moved the
> >> numbers since we checked.
> 
> > Uh, pg_upgrade assumes autovacuum is not running, and tries to enforce
> > this:
> 
> The problems come from autovac running before or after pg_upgrade.
> 
> > Perhaps the test script should do something similar,
> 
> I'm not on board with that, for the reasons I gave upthread.

Uh, I assume it is this paragraph:

> If that is the explanation, then it leaves us with few good options.
> I am not in favor of disabling autovacuum in the test: ordinary
> users are not going to do that while pg_upgrade'ing, so it'd make
> the test less representative of real-world usage, which seems like
> a bad idea.  We could either drop this particular check again, or
> weaken it to allow new relfrozenxid >= old relfrozenxid, likewise
> relminxid.

I thought the test was setting up a configuration that would never be
used by normal servers.  Is that false?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Indecision is a decision.  Inaction is an action.  Mark Batterson

Re: pg15b2: large objects lost on upgrade

From

Tom Lane

Date:

09 August 2022, 03:07:49

Bruce Momjian <bruce@momjian.us> writes:
> I thought the test was setting up a configuration that would never be
> used by normal servers.  Is that false?

*If* we made it disable autovac before starting pg_upgrade,
then that would be a process not used by normal users.
I don't care whether pg_upgrade disables autovac during its
run; that's not what's at issue here.

            regards, tom lane