Thread: pgsql: Trial fix for old cross-version upgrades.

pgsql: Trial fix for old cross-version upgrades.

From
Jeff Davis
Date:
Trial fix for old cross-version upgrades.

Per buildfarm and reports, it seems that 9.X to 18 upgrades were
failing after commit 1fd1bd8710 due to an incorrect regex. Loosen the
regex to accommodate older versions.

Reported-by: vignesh C <vignesh21@gmail.com>
Reported-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://postgr.es/m/CALDaNm3GUs+U8Nt4S=V5zmb+K8-RfAc03vRENS0teeoq0Lc6Tw@mail.gmail.com
Discussion: https://postgr.es/m/ea4cbbc1-c5a5-43d1-9618-8ff3f2155bfe@dunslane.net

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/ab84d0ff806dd791ea9da5f1ca302daf3cf42980

Modified Files
--------------
src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)


Re: pgsql: Trial fix for old cross-version upgrades.

From
Jeff Davis
Date:
On Thu, 2025-02-20 at 18:30 +0000, Jeff Davis wrote:
> Trial fix for old cross-version upgrades.

That fixed one problem, but there are other problems with 9.2 upgrades.
And it looks like there's also a problem with upgrades from 12.
Investigating...

Regards,
    Jeff Davis




Re: pgsql: Trial fix for old cross-version upgrades.

From
Jeff Davis
Date:
On Thu, 2025-02-20 at 13:17 -0800, Jeff Davis wrote:
> On Thu, 2025-02-20 at 18:30 +0000, Jeff Davis wrote:
> > Trial fix for old cross-version upgrades.
>
> That fixed one problem, but there are other problems with 9.2
> upgrades.

The version that I committed had the following change to
002_pg_upgrade.pl:

  # Stabilize stats before pg_dumpall.
  $oldnode->append_conf('postgresql.conf', 'autovacuum = off');
  $oldnode->restart;

  ...

  $newnode->append_conf('postgresql.conf', 'autovacuum = off');

I think we need a similar change in the buildfarm client's
TestUpgradeXversion.pm? Is -hackers the right place to discuss that?

Regards,
    Jeff Davis




Re: pgsql: Trial fix for old cross-version upgrades.

From
Robert Haas
Date:
On Fri, Feb 21, 2025 at 7:57 PM Jeff Davis <pgsql@j-davis.com> wrote:
> I think we need a similar change in the buildfarm client's
> TestUpgradeXversion.pm? Is -hackers the right place to discuss that?

Yes, I believe -hackers is the right place to discuss that.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pgsql: Trial fix for old cross-version upgrades.

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> The version that I committed had the following change to
> 002_pg_upgrade.pl:

>   # Stabilize stats before pg_dumpall.
>   $oldnode->append_conf('postgresql.conf', 'autovacuum = off');
>   $oldnode->restart;

>   ...

>   $newnode->append_conf('postgresql.conf', 'autovacuum = off');

> I think we need a similar change in the buildfarm client's
> TestUpgradeXversion.pm? Is -hackers the right place to discuss that?

I think we might indeed want that, but it doesn't seem to be the
explanation for the buildfarm failures, because the diffs look
to be consistent across runs which you'd not expect from
autovacuum-driven changes.  I suspect that the problem is that
pg_dump is interpreting old-version stats in some way that doesn't
match up with what we get from restoring the dump.

I did experiment with the attached very-quick-n-dirty patch, which
should succeed in suppressing autovacuum in both the old and new
versions if I understand the code correctly (which I might well not).
It made no difference at all in the dump diffs ...

            regards, tom lane

--- PGBuild/Modules/TestUpgradeXversion.pm~    2024-11-02 01:47:21.000000000 -0400
+++ PGBuild/Modules/TestUpgradeXversion.pm    2025-02-21 20:50:35.705564091 -0500
@@ -419,7 +419,7 @@ sub test_upgrade    ## no critic (Subrou
     # run in which it was set up, which will be gone by now, so we repoint
     # it to the current run's tmpdir.
     # listen_addresses will be set correctly and requires no adjustment.
-    if (!$using_localhost)
+    # In any case, disable autovacuum to prevent stats changing under us.
     {
         my $tdir = $tmpdir;
         $tdir =~ s!\\!/!g;
@@ -431,7 +431,9 @@ sub test_upgrade    ## no critic (Subrou
         $param = "unix_socket_directory"
           if $oversion ne 'HEAD' && $oversion lt 'REL9_3_STABLE';
         print $opgconf "\n# Configuration added by buildfarm client\n\n";
-        print $opgconf "$param = '$tdir'\n";
+        print $opgconf "$param = '$tdir'\n"
+            if (!$using_localhost);
+        print $opgconf "autovacuum = off\n";
         close($opgconf);
     }

@@ -507,7 +509,7 @@ sub test_upgrade    ## no critic (Subrou
           . qq{> "$upgrade_loc/$oversion-initdb.log" 2>&1});
     return if $?;

-    unless ($using_localhost)
+    # Again, adjust connection location and disable autovacuum.
     {
         open(my $pgconf, ">>", "$installdir/$oversion-upgrade/postgresql.conf")
           || die "opening $installdir/$oversion-upgrade/postgresql.conf: $!";
@@ -515,8 +517,12 @@ sub test_upgrade    ## no critic (Subrou
         $tmp_param = "unix_socket_directory"
           if $this_branch ne 'HEAD' && $this_branch lt 'REL9_3_STABLE';
         print $pgconf "\n# Configuration added by buildfarm client\n\n";
-        print $pgconf "listen_addresses = ''\n";
-        print $pgconf "$tmp_param = '$tmpdir'\n";
+        unless ($using_localhost)
+        {
+            print $pgconf "listen_addresses = ''\n";
+            print $pgconf "$tmp_param = '$tmpdir'\n";
+        }
+        print $pgconf "autovacuum = off\n";
         close($pgconf);
     }


Re: pgsql: Trial fix for old cross-version upgrades.

From
Jeff Davis
Date:
On Fri, 2025-02-21 at 21:00 -0500, Tom Lane wrote:
> I think we might indeed want that, but it doesn't seem to be the
> explanation for the buildfarm failures, because the diffs look
> to be consistent across runs which you'd not expect from
> autovacuum-driven changes.  I suspect that the problem is that
> pg_dump is interpreting old-version stats in some way that doesn't
> match up with what we get from restoring the dump.

I agree it doesn't explain the failures.

> I did experiment with the attached very-quick-n-dirty patch, which
> should succeed in suppressing autovacuum in both the old and new
> versions if I understand the code correctly (which I might well not).
> It made no difference at all in the dump diffs ...

In 002_pg_upgrade.pl, I disabled autovacuum and restarted after the
regression run. In other words, in the old cluster, autovacuum did have
a chance to run, just not after the first dumpall.

Regards,
    Jeff Davis




Re: pgsql: Trial fix for old cross-version upgrades.

From
Jeff Davis
Date:
On Fri, 2025-02-21 at 21:57 -0500, Tom Lane wrote:
> Hmm.  I forced my local BF installation to run a v17-to-HEAD upgrade
> test, and it still failed, though seemingly with fewer diffs than
> the buildfarm is reporting for older branches.  (Diffs attached for
> amusement's sake.)  I don't believe we've made any definitional
> changes in the contents of pg_statistic since v17, so whatever's
> going on here seems a little subtler than I was hoping.

Also the non-buildfarm tests (src/bin/pg_upgrade/TESTING) are all
passing for versions 10+.

> I wonder if it'd be a good idea to rearrange TestUpgradeXversion.pm
> so that instead of testing upgrades from oldest prior version to
> newest, it tested from newest to oldest?  My thought here is that
> the oldest cases are most likely to fail, and when they do, it'd
> be valuable information to know which branches still work.

That's a good idea and would be a big help.

Regards,
    Jeff Davis




Re: pgsql: Trial fix for old cross-version upgrades.

From
Andrew Dunstan
Date:
On 2025-02-21 Fr 10:11 PM, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
>> In 002_pg_upgrade.pl, I disabled autovacuum and restarted after the
>> regression run. In other words, in the old cluster, autovacuum did have
>> a chance to run, just not after the first dumpall.
> The hack I posted should prevent autovacuum from running either
> before or after dumping, in either cluster.
>
> However, it occurred to me to try forcing a HEAD-to-HEAD upgrade,
> a case the buildfarm animals have not been able to reach.
> And *it failed*!?  (Diffs attached.)  So that eliminates the
> theory that this is a cross-version compatibility problem, or
> at least that is not our only problem.  It seems there is
> something different between what TestUpgradeXversion.pm is doing
> and what 002_pg_upgrade.pl is doing.  No clue what, although it
> does look like an additional round of analyze'ing has added more
> stats than were there before.
>
>             



Yeah, I don't know.


Here's what I have so far:

. for HEAD/18 disable running the analyze script / vacuumdb --analyze.

. turn off autovacuum on the old and upgraded database.

. reverse the order of testing so we do newest first


What I'm thinking of doing is running all the eligible upgrades rather 
than stopping on the first failure.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Trial fix for old cross-version upgrades.

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 2025-02-21 Fr 10:11 PM, Tom Lane wrote:
>> ... It seems there is
>> something different between what TestUpgradeXversion.pm is doing
>> and what 002_pg_upgrade.pl is doing.  No clue what, although it
>> does look like an additional round of analyze'ing has added more
>> stats than were there before.

Hah!  Looking at the script with less bleary eyes, I see it does
this after pg_upgrade:

    if (-e "$installdir/analyze_new_cluster.sh")
    {
        system( "cd $installdir && sh ./analyze_new_cluster.sh "
              . qq{> "$upgrade_loc/$oversion-analyse.log" 2>&1 });
        return if $?;
    }
    else
    {
        system( qq{"$installdir/bin/vacuumdb" --all --analyze-only }
              . qq{> "$upgrade_loc/$oversion-analyse.log" 2>&1 });
        return if $?;
    }

So there's our extra round of ANALYZE right there.  I diked out the
vacuumdb call and things are working much better.  It seems to pass
for branches back through 9.3, and upgrade from 9.2 has only some
diffs for relallvisible (see attached).  We still need to figure out
why that is, but a quick-n-dirty patch could just be to make the dump
comparison logic ignore relallvisible diffs.

We might want to make this vacuumdb invocation dependent on version,
but I could also see just removing it entirely.

> Here's what I have so far:

> . for HEAD/18 disable running the analyze script / vacuumdb --analyze.
> . turn off autovacuum on the old and upgraded database.
> . reverse the order of testing so we do newest first

Check.

> What I'm thinking of doing is running all the eligible upgrades rather 
> than stopping on the first failure.

Seems like possibly wasted work --- I'd be content with
newest-to-oldest.

            regards, tom lane

--- PGBuild/Modules/TestUpgradeXversion.pm.orig    2024-11-02 01:47:21.000000000 -0400
+++ PGBuild/Modules/TestUpgradeXversion.pm    2025-02-22 12:11:31.924183704 -0500
@@ -419,7 +419,7 @@ sub test_upgrade    ## no critic (Subrou
     # run in which it was set up, which will be gone by now, so we repoint
     # it to the current run's tmpdir.
     # listen_addresses will be set correctly and requires no adjustment.
-    if (!$using_localhost)
+    # In any case, disable autovacuum to prevent stats changing under us.
     {
         my $tdir = $tmpdir;
         $tdir =~ s!\\!/!g;
@@ -431,7 +431,9 @@ sub test_upgrade    ## no critic (Subrou
         $param = "unix_socket_directory"
           if $oversion ne 'HEAD' && $oversion lt 'REL9_3_STABLE';
         print $opgconf "\n# Configuration added by buildfarm client\n\n";
-        print $opgconf "$param = '$tdir'\n";
+        print $opgconf "$param = '$tdir'\n"
+            if (!$using_localhost);
+        print $opgconf "autovacuum = off\n";
         close($opgconf);
     }

@@ -507,7 +509,7 @@ sub test_upgrade    ## no critic (Subrou
           . qq{> "$upgrade_loc/$oversion-initdb.log" 2>&1});
     return if $?;

-    unless ($using_localhost)
+    # Again, adjust connection location and disable autovacuum.
     {
         open(my $pgconf, ">>", "$installdir/$oversion-upgrade/postgresql.conf")
           || die "opening $installdir/$oversion-upgrade/postgresql.conf: $!";
@@ -515,8 +517,12 @@ sub test_upgrade    ## no critic (Subrou
         $tmp_param = "unix_socket_directory"
           if $this_branch ne 'HEAD' && $this_branch lt 'REL9_3_STABLE';
         print $pgconf "\n# Configuration added by buildfarm client\n\n";
-        print $pgconf "listen_addresses = ''\n";
-        print $pgconf "$tmp_param = '$tmpdir'\n";
+        unless ($using_localhost)
+        {
+            print $pgconf "listen_addresses = ''\n";
+            print $pgconf "$tmp_param = '$tmpdir'\n";
+        }
+        print $pgconf "autovacuum = off\n";
         close($pgconf);
     }

@@ -573,9 +579,9 @@ sub test_upgrade    ## no critic (Subrou
     }
     else
     {
-        system( qq{"$installdir/bin/vacuumdb" --all --analyze-only }
-              . qq{> "$upgrade_loc/$oversion-analyse.log" 2>&1 });
-        return if $?;
+#        system( qq{"$installdir/bin/vacuumdb" --all --analyze-only }
+#              . qq{> "$upgrade_loc/$oversion-analyse.log" 2>&1 });
+#        return if $?;
     }

     if (-e "$installdir/reindex_hash.sql")
--- /home/buildfarm/bf-data/upgrade.tester/HEAD/origin-REL9_2_STABLE.sql.fixed    2025-02-22 12:55:24.073548904 -0500
+++ /home/buildfarm/bf-data/upgrade.tester/HEAD/converted-REL9_2_STABLE-to-HEAD.sql.fixed    2025-02-22
12:55:24.074548905-0500 
@@ -208450,7 +208450,7 @@
     'version', '000000'::integer,
     'relpages', '55'::integer,
     'reltuples', '10000'::real,
-    'relallvisible', '55'::integer
+    'relallvisible', '53'::integer
 );
 SELECT * FROM pg_catalog.pg_restore_attribute_stats(
     'relation', 'public.hash_f8_heap'::regclass,
@@ -208482,7 +208482,7 @@
     'version', '000000'::integer,
     'relpages', '45'::integer,
     'reltuples', '10000'::real,
-    'relallvisible', '45'::integer
+    'relallvisible', '42'::integer
 );
 SELECT * FROM pg_catalog.pg_restore_attribute_stats(
     'relation', 'public.hash_i4_heap'::regclass,
@@ -208514,7 +208514,7 @@
     'version', '000000'::integer,
     'relpages', '124'::integer,
     'reltuples', '10000'::real,
-    'relallvisible', '124'::integer
+    'relallvisible', '122'::integer
 );
 SELECT * FROM pg_catalog.pg_restore_attribute_stats(
     'relation', 'public.hash_name_heap'::regclass,
@@ -208546,7 +208546,7 @@
     'version', '000000'::integer,
     'relpages', '55'::integer,
     'reltuples', '10000'::real,
-    'relallvisible', '55'::integer
+    'relallvisible', '52'::integer
 );
 SELECT * FROM pg_catalog.pg_restore_attribute_stats(
     'relation', 'public.hash_txt_heap'::regclass,

Re: pgsql: Trial fix for old cross-version upgrades.

From
Andrew Dunstan
Date:
On 2025-02-22 Sa 1:34 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 2025-02-21 Fr 10:11 PM, Tom Lane wrote:
>>> ... It seems there is
>>> something different between what TestUpgradeXversion.pm is doing
>>> and what 002_pg_upgrade.pl is doing.  No clue what, although it
>>> does look like an additional round of analyze'ing has added more
>>> stats than were there before.
> Hah!  Looking at the script with less bleary eyes, I see it does
> this after pg_upgrade:
>
>      if (-e "$installdir/analyze_new_cluster.sh")
>      {
>          system( "cd $installdir && sh ./analyze_new_cluster.sh "
>                . qq{> "$upgrade_loc/$oversion-analyse.log" 2>&1 });
>          return if $?;
>      }
>      else
>      {
>          system( qq{"$installdir/bin/vacuumdb" --all --analyze-only }
>                . qq{> "$upgrade_loc/$oversion-analyse.log" 2>&1 });
>          return if $?;
>      }
>
> So there's our extra round of ANALYZE right there.  I diked out the
> vacuumdb call and things are working much better.  It seems to pass
> for branches back through 9.3, and upgrade from 9.2 has only some
> diffs for relallvisible (see attached).  We still need to figure out
> why that is, but a quick-n-dirty patch could just be to make the dump
> comparison logic ignore relallvisible diffs.
>
> We might want to make this vacuumdb invocation dependent on version,
> but I could also see just removing it entirely.
>
>> Here's what I have so far:
>> . for HEAD/18 disable running the analyze script / vacuumdb --analyze.
>> . turn off autovacuum on the old and upgraded database.
>> . reverse the order of testing so we do newest first
> Check.
>
>> What I'm thinking of doing is running all the eligible upgrades rather
>> than stopping on the first failure.
> Seems like possibly wasted work --- I'd be content with
> newest-to-oldest.
>
>             




OK, went with that. crake is getting a failure at 9.6 like you saw with 
9.2, but things are a whole lot better than they were.


I have updated drongo and fairywren with the new code too, so they 
should also improve before long.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Trial fix for old cross-version upgrades.

From
Tom Lane
Date:
I wrote:
> So there's our extra round of ANALYZE right there.  I diked out the
> vacuumdb call and things are working much better.  It seems to pass
> for branches back through 9.3, and upgrade from 9.2 has only some
> diffs for relallvisible (see attached).

I confess that when I wrote that, I hadn't actually tested every
single branch as an upgrade source.  After seeing the latest result
from crake (which failed at 9.6), I went back and filled in the gaps.
I can now say that 9.2 and 9.6 fail, but every other branch works.
What's more, the diffs from 9.2 and 9.6 are identical, and match
what crake is showing for 9.6.  That's just crazy --- I would be
unsurprised if a range of back releases were misbehaving in the same
way, but not two isolated branches.

Furthermore, it can't be a coincidence that the four tables we are
seeing relallvisible diffs for are exactly the four tables in the
regression database that have hash indexes.

But I'm baffled where to look beyond that.  I could believe that
CREATE INDEX with a hash index misbehaves by changing the
relallvisible value even when we're doing a binary upgrade --- but
such a bug would be on the restoring side, so how would it be
sensitive to the source branch?  I'm confused.

            regards, tom lane



Re: pgsql: Trial fix for old cross-version upgrades.

From
Jeff Davis
Date:
On Sat, 2025-02-22 at 20:15 -0500, Tom Lane wrote:
> That's just crazy --- I would be
> unsurprised if a range of back releases were misbehaving in the same
> way, but not two isolated branches.
>
> Furthermore, it can't be a coincidence that the four tables we are
> seeing relallvisible diffs for are exactly the four tables in the
> regression database that have hash indexes.
>
> But I'm baffled where to look beyond that.  I could believe that
> CREATE INDEX with a hash index misbehaves by changing the
> relallvisible value even when we're doing a binary upgrade --- but
> such a bug would be on the restoring side, so how would it be
> sensitive to the source branch?  I'm confused.

It's also strange that copperhead is consistently failing on 12 with:

  pg_restore: while PROCESSING TOC:
  pg_restore: from TOC entry 4163; 0 0 STATISTICS DATA "vcharidx" (no
owner)
  pg_restore: error: could not execute query: ERROR:  column "text" of
relation "vcharidx" does not exist


https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=copperhead&dt=2025-02-22%2009%3A10%3A36&stg=xversion-upgrade-REL_12_STABLE-HEAD

I was puzzling through whether the attribute name uniqueness logic was
doing something strange, but it's very simple. And the table and index
should both be locked at the point of the syscache lookup.

Regards,
    Jeff Davis




Re: pgsql: Trial fix for old cross-version upgrades.

From
Tom Lane
Date:
I wrote:
> Furthermore, it can't be a coincidence that the four tables we are
> seeing relallvisible diffs for are exactly the four tables in the
> regression database that have hash indexes.

Oh!  If the source version is <= 9.6, old_9_6_invalidate_hash_indexes()
generates a script "reindex_hash.sql" to reindex all hash indexes.
And TestUpgradeXversion faithfully executes that, *before* making
the new-database comparison dump.

I added some instrumentation and confirmed that these tables'
relallvisible values match between the pre-dump state on the old
database and the immediately-post-upgrade state on the new database.
It's definitely reindex_hash.sql that's changing them.  This doesn't
in itself explain why 9.3-9.5 don't show the problem, but I noticed
something interesting: it's the pre-dump state that is out of line
in 9.2 and 9.6.  All the other cases show relallvisible that's a
couple pages less than relpages, but in those two branches we
start from a state that claims these rels are fully all-visible,
and then seemingly REINDEX discovers that that's not so.
What I'm guessing is that the variance is due to changes in
vacuum/analyze's heuristics for updating pg_class.relallvisible
after a partial scan of the table.

Anyway, we know where the culprit is, and I'm not sure that
explaining the differences in behavior of long-dead branches
is an exciting use of time.

Question is, what to do about this?  We probably want to continue to
test that reindex_hash.sql does something sane, so just deleting that
step won't do.  I experimented with moving it from immediately before
the pg_dumpall of the new installation to immediately after, but that
didn't work at all: the indexes are marked invalid and so pg_dump just
doesn't dump them, so we still end up with a diff in the dump output.

I'm not really seeing a better answer than hacking the comparison
rules to ignore the relallvisible difference for these tables.
That's darn ugly, and I suspect the implementation will be messy,
but do we have another way?

            regards, tom lane



Re: pgsql: Trial fix for old cross-version upgrades.

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> It's also strange that copperhead is consistently failing on 12 with:

>   pg_restore: while PROCESSING TOC:
>   pg_restore: from TOC entry 4163; 0 0 STATISTICS DATA "vcharidx" (no
> owner)
>   pg_restore: error: could not execute query: ERROR:  column "text" of
> relation "vcharidx" does not exist

Ugh.  I see what is happening, and it's going to be problematic to
fix: our heuristics for assigning names to index expression columns
are not very consistent/stable.  It didn't matter up to now, but
this patch assumes that it can reference index columns by name.
The index in question is made in btree_gist's tests:

CREATE INDEX vcharidx ON vchartmp USING GIST ( text(a) );

The index column will be given the name "text".  However, it
dumps as

CREATE INDEX vcharidx ON public.vchartmp USING gist (((a)::text));

and when *that* gets loaded, the index column is given the name
"a", because FigureColname treats function-like constructs
differently from cast-like constructs.  Then
pg_restore_attribute_stats unsurprisingly fails.  I think the
reason that Andrew and I aren't seeing that is that we are
using machines that are fast enough to shut down the DB before
autovacuum gets around to populating some stats for this
expression index.

We have dealt with some similar issues in the past, and the
solution was to allow index columns to be referenced by
column number not name.  (ALTER INDEX ... ALTER COLUMN ...
SET STATISTICS does that, not sure if there are other places.)
Recommend adopting the same solution here.

            regards, tom lane



Re: pgsql: Trial fix for old cross-version upgrades.

From
Andrew Dunstan
Date:


On 2025-02-22 Sa 9:20 PM, Tom Lane wrote:
I wrote:
Furthermore, it can't be a coincidence that the four tables we are
seeing relallvisible diffs for are exactly the four tables in the
regression database that have hash indexes.
Oh!  If the source version is <= 9.6, old_9_6_invalidate_hash_indexes()
generates a script "reindex_hash.sql" to reindex all hash indexes.
And TestUpgradeXversion faithfully executes that, *before* making
the new-database comparison dump.

I added some instrumentation and confirmed that these tables'
relallvisible values match between the pre-dump state on the old
database and the immediately-post-upgrade state on the new database.
It's definitely reindex_hash.sql that's changing them.  This doesn't
in itself explain why 9.3-9.5 don't show the problem, but I noticed
something interesting: it's the pre-dump state that is out of line
in 9.2 and 9.6.  All the other cases show relallvisible that's a
couple pages less than relpages, but in those two branches we
start from a state that claims these rels are fully all-visible,
and then seemingly REINDEX discovers that that's not so.
What I'm guessing is that the variance is due to changes in
vacuum/analyze's heuristics for updating pg_class.relallvisible
after a partial scan of the table.

Anyway, we know where the culprit is, and I'm not sure that
explaining the differences in behavior of long-dead branches
is an exciting use of time.

Question is, what to do about this?  We probably want to continue to
test that reindex_hash.sql does something sane, so just deleting that
step won't do.  I experimented with moving it from immediately before
the pg_dumpall of the new installation to immediately after, but that
didn't work at all: the indexes are marked invalid and so pg_dump just
doesn't dump them, so we still end up with a diff in the dump output.

I'm not really seeing a better answer than hacking the comparison
rules to ignore the relallvisible difference for these tables.
That's darn ugly, and I suspect the implementation will be messy,
but do we have another way?
			


Having slept on it I can't see anything better. It's only for very old branches, and nothing is really going wrong here, so ignoring the difference seems quite reasonable.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: pgsql: Trial fix for old cross-version upgrades.

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 2025-02-22 Sa 9:20 PM, Tom Lane wrote:
>> I'm not really seeing a better answer than hacking the comparison
>> rules to ignore the relallvisible difference for these tables.
>> That's darn ugly, and I suspect the implementation will be messy,
>> but do we have another way?

> Having slept on it I can't see anything better. It's only for very old 
> branches, and nothing is really going wrong here, so ignoring the 
> difference seems quite reasonable.

Agreed, and done.

            regards, tom lane



Re: pgsql: Trial fix for old cross-version upgrades.

From
Jeff Davis
Date:
On Sat, 2025-02-22 at 21:48 -0500, Tom Lane wrote:
> CREATE INDEX vcharidx ON vchartmp USING GIST ( text(a) );
>
> The index column will be given the name "text".  However, it
> dumps as
>
> CREATE INDEX vcharidx ON public.vchartmp USING gist (((a)::text));

Thank you! I dismissed the naming issue too early, going down the path
of something related to GiST.

> and when *that* gets loaded, the index column is given the name
> "a", because FigureColname treats function-like constructs
> differently from cast-like constructs.

Ugh.

> We have dealt with some similar issues in the past, and the
> solution was to allow index columns to be referenced by
> column number not name.  (ALTER INDEX ... ALTER COLUMN ...
> SET STATISTICS does that, not sure if there are other places.)
> Recommend adopting the same solution here.

I'll submit a patch to the -hackers thread. There are a few minor
choices here that might get some discussion.

Regards,
    Jeff Davis




Re: pgsql: Trial fix for old cross-version upgrades.

From
Sami Imseih
Date:
I was running "make check-world" this morning on a machine that has an
old version of perl, 5.16, and the check-world was hung on
/usr/bin/perl t/002_pg_upgrade.pl.

I never saw "make check-world" hang on this old version of perl,
but I bisected the regression to this commit fc0d0ce978752493,
the subject of this thread.

This is the pstack of the stuck perl process.

#0  0x00007f5964b4b1dc in Perl_fbm_instr () from
/usr/lib64/perl5/CORE/libperl.so
#1  0x00007f5964bd146f in Perl_re_intuit_start () from
/usr/lib64/perl5/CORE/libperl.so
#2  0x00007f5964bd3400 in Perl_regexec_flags () from
/usr/lib64/perl5/CORE/libperl.so
#3  0x00007f5964b6d3fc in Perl_pp_subst () from /usr/lib64/perl5/CORE/libperl.so
#4  0x00007f5964b67e1e in Perl_runops_standard () from
/usr/lib64/perl5/CORE/libperl.so
#5  0x00007f5964b07463 in perl_run () from /usr/lib64/perl5/CORE/libperl.so
#6  0x0000000000400cd9 in main ()

and specifically, the process hangs with this specific change.

-       $dump =~ s ['version', '\d+'::integer,]
-               ['version', '000000'::integer,]mg;
+       $dump =~ s {(^\s+'version',) '\d+'::integer,$}
+               {$1 '000000'::integer,}mg;


I repro'd on several machines with 5.16, and then issue went away once
I upgraded to a  more recent version of perl; at least with 5.25, the
issue does
not occur.

I don't know if we need to do anything here, except for making sure we are
building with the most recent version of perl, but I wanted to mention this
here for awareness.

--

Sami



Re: pgsql: Trial fix for old cross-version upgrades.

From
Tom Lane
Date:
Sami Imseih <samimseih@gmail.com> writes:
> I was running "make check-world" this morning on a machine that has an
> old version of perl, 5.16, and the check-world was hung on
> /usr/bin/perl t/002_pg_upgrade.pl.

Interesting!

> and specifically, the process hangs with this specific change.

> -       $dump =~ s ['version', '\d+'::integer,]
> -               ['version', '000000'::integer,]mg;
> +       $dump =~ s {(^\s+'version',) '\d+'::integer,$}
> +               {$1 '000000'::integer,}mg;

5.16 is still supported according to our install instructions,
so let's see if we can adjust that regex so it works with 5.16.
The first thing I'd try is

-       $dump =~ s {(^\s+'version',) '\d+'::integer,$}
+       $dump =~ s {^(\s+'version',) '\d+'::integer,$}

since I notice we mostly don't put ^ inside capture parens elsewhere.
Are you in a position to test that?

            regards, tom lane



Re: pgsql: Trial fix for old cross-version upgrades.

From
Sami Imseih
Date:
> 5.16 is still supported according to our install instructions,
> so let's see if we can adjust that regex so it works with 5.16.
> The first thing I'd try is
>
> -       $dump =~ s {(^\s+'version',) '\d+'::integer,$}
> +       $dump =~ s {^(\s+'version',) '\d+'::integer,$}
>

With the diff below, I still observed the same issue.

@@ -296,7 +296,7 @@ sub adjust_old_dumpfile

        # Same with version argument to pg_restore_relation_stats() or
        # pg_restore_attribute_stats().
-       $dump =~ s {(^\s+'version',) '\d+'::integer,$}
+       $dump =~ s {^(\s+'version',) '\d+'::integer,$}
                {$1 '000000'::integer,}mg;

        if ($old_version < 16)
@@ -645,7 +645,7 @@ sub adjust_new_dumpfile

        # Same with version argument to pg_restore_relation_stats() or
        # pg_restore_attribute_stats().
-       $dump =~ s {(^\s+'version',) '\d+'::integer,$}
+       $dump =~ s {^(\s+'version',) '\d+'::integer,$}
                {$1 '000000'::integer,}mg;


I also tested the same regexp expression in isolation
and I could not repro the issue. Will try a real dumpfile next.

/tmp/test.pl
"""
use strict;

my $dump = " 'version', '180000'::integer,";
print "$dump\n";

$dump =~ s [(^\s+'version',) '\d+'::integer,$] [$1 '000000'::integer,]mg;

print "$dump\n";

"""

simseih %  /usr/bin/perl -v | grep "This is perl"
This is perl 5, version 16, subversion 3 (v5.16.3) built for
x86_64-linux-thread-multi

simseih % /usr/bin/perl /tmp/test.pl
 'version', '180000'::integer,
 'version', '000000'::integer,

--

Sami



Re: pgsql: Trial fix for old cross-version upgrades.

From
Sami Imseih
Date:
> I also tested the same regexp expression in isolation
> and I could not repro the issue. Will try a real dumpfile next.
>

repro'd. I don't want to attach the file here, but I added a cp
to get the dump file somewhere local

+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -326,10 +326,12 @@ $oldnode->restart;
 my @dump_command = (
        'pg_dumpall', '--no-sync', '-d', $oldnode->connstr('postgres'),
        '-f', $dump1_file);
+
 # --extra-float-digits is needed when upgrading from a version older than 11.
 push(@dump_command, '--extra-float-digits', '0')
   if ($oldnode->pg_version < 12);
 $newnode->command_ok(\@dump_command, 'dump before running pg_upgrade');
+system("cp $dump1_file /tmp/myfile.dmp");

and ran with the following perl script

"""
use strict;
use File::Slurp;

my $dump = read_file( '/tmp/myfile.dmp' );
#my $dump = "filler\n 'version', '180000'::integer,\n filler";
print "$dump\n";
$dump =~ s [(^\s+'version',) '\d+'::integer,$] [$1 '000000'::integer,]mg;

print "$dump\n";
"""

--
Sami



Re: pgsql: Trial fix for old cross-version upgrades.

From
Sami Imseih
Date:
My perl expertise is bit shallow, but I could not find much
regarding bugs related to such behavior, or maybe I did not
look enough.

Playing around with this, "s+", "s{1,}", s{2,}" all of these combinations
where we are searching for more than 1 space result in the hanging command,
but we really only need to look for a single space before the 'version',
so maybe we can just do the below, which works?

-       $dump =~ s {(^\s+'version',) '\d+'::integer,$}
+      $dump =~ s {(^\s{1}'version',) '\d+'::integer,$}

-- 
Sami



Re: pgsql: Trial fix for old cross-version upgrades.

From
Tom Lane
Date:
Sami Imseih <samimseih@gmail.com> writes:
> repro'd. I don't want to attach the file here, but I added a cp
> to get the dump file somewhere local

Thanks for the tips about an efficient repro.  I installed 5.16.3
locally using perlbrew and was able to duplicate the problem.
After a bit of fooling around I found that using an explicit \n
instead of ^ fixes it.  Not the world's most idiomatic regex,
but it'll do.  Will push the attached shortly.

            regards, tom lane

diff --git a/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm b/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
index ec874852d12..81a8f44aa9f 100644
--- a/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
+++ b/src/test/perl/PostgreSQL/Test/AdjustUpgrade.pm
@@ -296,7 +296,7 @@ sub adjust_old_dumpfile

     # Same with version argument to pg_restore_relation_stats() or
     # pg_restore_attribute_stats().
-    $dump =~ s {(^\s+'version',) '\d+'::integer,$}
+    $dump =~ s {\n(\s+'version',) '\d+'::integer,$}
         {$1 '000000'::integer,}mg;

     if ($old_version < 16)
@@ -645,7 +645,7 @@ sub adjust_new_dumpfile

     # Same with version argument to pg_restore_relation_stats() or
     # pg_restore_attribute_stats().
-    $dump =~ s {(^\s+'version',) '\d+'::integer,$}
+    $dump =~ s {\n(\s+'version',) '\d+'::integer,$}
         {$1 '000000'::integer,}mg;

     # pre-v16 dumps do not know about XMLSERIALIZE(NO INDENT).

Re: pgsql: Trial fix for old cross-version upgrades.

From
Sami Imseih
Date:
> After a bit of fooling around I found that using an explicit \n
> instead of ^ fixes it.  Not the world's most idiomatic regex,
> but it'll do.  Will push the attached shortly.

I can also confirm this works as expected, although I feel my
suggested expression [0] is easier to understand, but I have no
strong opinion either way.

Regards,

--
Sami

[0] https://www.postgresql.org/message-id/CAA5RZ0tz-AibaxVF1NnNMcMEmwut%2Bkn%3Dk%3DxG2XmDN9d6kv2-%3Dg%40mail.gmail.com



Re: pgsql: Trial fix for old cross-version upgrades.

From
Andrew Dunstan
Date:
On 2025-02-28 Fr 2:29 PM, Sami Imseih wrote:
> My perl expertise is bit shallow, but I could not find much
> regarding bugs related to such behavior, or maybe I did not
> look enough.
>
> Playing around with this, "s+", "s{1,}", s{2,}" all of these combinations
> where we are searching for more than 1 space result in the hanging command,
> but we really only need to look for a single space before the 'version',
> so maybe we can just do the below, which works?
>
> -       $dump =~ s {(^\s+'version',) '\d+'::integer,$}
> +      $dump =~ s {(^\s{1}'version',) '\d+'::integer,$}
>


Just noting here that \s{1} is simply the same as \s


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: pgsql: Trial fix for old cross-version upgrades.

From
Sami Imseih
Date:


Just noting here that \s{1} is simply the same as /s

Correct. But I do like the explicitness of it though. 


— 
Sami 

Re: pgsql: Trial fix for old cross-version upgrades.

From
Tom Lane
Date:
Sami Imseih <samimseih@gmail.com> writes:
>> Just noting here that \s{1} is simply the same as /s

> Correct. But I do like the explicitness of it though.

Yeah.  I like mine better though because it's not dependent
on the amount of whitespace at the start of the line.

            regards, tom lane