Re: Test to dump and restore objects left behind by regression - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Test to dump and restore objects left behind by regression
Date
Msg-id CAExHW5snZ+y47BwH89EhhDc7ZtWN4VD0iWrODbvA6q2XQitqvA@mail.gmail.com
Whole thread Raw
In response to Re: Test to dump and restore objects left behind by regression  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Test to dump and restore objects left behind by regression
List pgsql-hackers


On Tue, Jun 4, 2024 at 4:28 AM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Apr 26, 2024 at 06:38:22PM +0530, Ashutosh Bapat wrote:
> Some points for discussion:
> 1. The test still hardcodes the diffs between two dumps. Haven't found a
> better way to do it. I did consider removing the problematic objects from
> the regression database but thought against it since we would lose some
> coverage.
>
> 2. The new code tests dump and restore of just the regression database and
> does not use pg_dumpall like pg_upgrade. Should it instead perform
> pg_dumpall? I decided against it since a. we are interested in dumping and
> restoring objects left behind by regression, b. I didn't find a way to
> provide the format option to pg_dumpall. The test could be enhanced to use
> different dump formats.
>
> I have added it to the next commitfest.
> https://commitfest.postgresql.org/48/4956/

Ashutosh and I have discussed this patch a bit last week.  Here is a
short summary of my input, after I understood what is going on.

+       # We could avoid this by dumping the database loaded from original dump.
+       # But that would change the state of the objects as left behind by the
+       # regression.
+       my $expected_diff = " --
+ CREATE TABLE public.gtestxx_4 (
+-    b integer,
+-    a integer NOT NULL
++    a integer NOT NULL,
++    b integer
+ )
[...]
+       my ($stdout, $stderr) =
+               run_command([ 'diff', '-u', $dump4_file, $dump5_file]);
+       # Clear file names, line numbers from the diffs; those are not going to
+       # remain the same always. Also clear empty lines and normalize new line
+       # characters across platforms.
+       $stdout =~ s/^\@\@.*$//mg;
+       $stdout =~ s/^.*$dump4_file.*$//mg;
+       $stdout =~ s/^.*$dump5_file.*$//mg;
+       $stdout =~ s/^\s*\n//mg;
+       $stdout =~ s/\r\n/\n/g;
+       $expected_diff =~ s/\r\n/\n/g;
+       is($stdout, $expected_diff, 'old and new dumps match after dump and restore');
+}

I am not a fan of what this patch does, adding the knowledge related
to the dump filtering within 002_pg_upgrade.pl.  Please do not take
me wrong, I am not against the idea of adding that within this
pg_upgrade test to save from one full cycle of `make check` to check
the consistency of the dump.  My issue is that this logic should be
externalized, and it should be in fewer lines of code.

For the externalization part, Ashutosh and I considered a few ideas,
but one that we found tempting is to create a small .pm, say named
AdjustDump.pm.  This would share some rules with the existing
AdjustUpgrade.pm, which would be fine IMO even if there is a small
overlap, documenting the dependency between each module.  That makes
the integration with the buildfarm much simpler by not creating more
dependencies with the modules shared between core and the buildfarm
code.  For the "shorter" part, one idea that I had is to apply to the
dump a regexp that wipes out the column definitions within the
parenthesis, keeping around the CREATE TABLE and any other attributes
not impacted by the reordering.  All that should be documented in the
module, of course.

Thanks for the suggestion. I didn't understand the dependency with the buildfarm module. Will the new module be used in buildfarm separately? I will work on this soon. Thanks for changing CF entry to WoA.
 

Another thing would be to improve the backend so as we are able to
a better support for physical column ordering, which would, I assume
(and correct me if I'm wrong!), prevent the reordering of the
attributes like in this inheritance case.  But that would not address
the case of dumps taken from older versions with a new version of
pg_dump, which is something that may be interesting to have more tests
for in the long-term.  Overall a module sounds like a better solution.

Changing the physical order of column of a child table based on the inherited table seems intentional as per MergeAttributes(). That logic looks sane by itself. In binary mode pg_dump works very hard to retain the column order by issuing UPDATE commands against catalog tables. I don't think mimicking that behaviour is the right choice for non-binary dump. I agree with your conclusion that we fix it in by fixing the diffs. The code to do that will be part of a separate module.

--
Best Wishes,
Ashutosh Bapat

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: meson and check-tests
Next
From: Jelte Fennema-Nio
Date:
Subject: libpq: Trace StartupMessage/SSLRequest/GSSENCRequest correctly