Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Date
Msg-id CALj2ACV4donjBLaxkJ1FO3Ybg73ut--9TozHTsNyBVCdX9opYA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL  (David Christensen <david.christensen@crunchydata.com>)
Responses Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
List pgsql-hackers
On Wed, Nov 16, 2022 at 1:20 AM David Christensen
<david.christensen@crunchydata.com> wrote:
>
> Enclosed is v9.
>
> - code style consistency (FPI instead of FPW) internally.
> - cleanup of no-longer needed checksum-related pieces from code and tests.
> - test cleanup/simplification.
> - other comment cleanup.
>
> Passes all CI checks.

Thanks for the updated patch.

1.
-        if (config.filter_by_fpw && !XLogRecordHasFPW(xlogreader_state))
+        if (config.filter_by_fpw && !XLogRecordHasFPI(xlogreader_state))
These changes are not related to this feature, hence renaming those
variables/function names must be dealt with separately. If required,
proposing another patch can be submitted to change filter_by_fpw to
filter_by_fpi and XLogRecordHasFPW() to XLogRecordHasFPI().

2.
+        /* We fsync our output directory only; since these files are not part
+         * of the production database we do not require the performance hit
+         * that fsyncing every FPI would entail, so are doing this as a
+         * compromise. */
The commenting style doesn't match the standard that we follow
elsewhere in postgres, please refer to other multi-line comments.

3.
+        fsync_fname(config.save_fpi_path, true);
+    }
It looks like fsync_fname()/fsync() in general isn't recursive, in the
sense that it doesn't fsync the files under the directory, but the
directory only. So, the idea of directory fsync doesn't seem worth it.
We either 1) get rid of fsync entirely or 2) fsync all the files after
they are created and the directory at the end or 3) do option (2) with
--no-sync option similar to its friends. Since option (2) is a no go,
we can either choose option (1) or option (2). My vote at this point
is for option (1).

4.
+($walfile_name, $blocksize) = split '\|' =>
$node->safe_psql('postgres',"SELECT pg_walfile_name(pg_switch_wal()),
current_setting('block_size')");
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
I think there's something wrong with this, no? pg_switch_wal() can, at
times, return end+1 of the prior segment (see below snippet) and I'm
not sure if such a case can happen here.

 * The return value is either the end+1 address of the switch record,
 * or the end+1 address of the prior segment if we did not need to
 * write a switch record because we are already at segment start.
 */
XLogRecPtr
RequestXLogSwitch(bool mark_unimportant)

5.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
+ok(-f $walfile, "Got a WAL file");
Is this checking if the WAL file is present or not in PGDATA/pg_wal?
If yes, I think this isn't required as pg_switch_wal() ensures that
the WAL is written and flushed to disk.

6.
+my $walfile = $node->basedir . '/pgdata/pg_wal/' . $walfile_name;
Isn't "pgdata" hardcoded here? I think you might need to do the following:
$node->data_dir . '/pg_wal/' . $walfile_name;;

7.
+    # save filename for later verification
+    $files{$file}++;

+# validate that we ended up with some FPIs saved
+ok(keys %files > 0, 'verify we processed some files');
Why do we need to store filenames in an array when we later just check
the size of the array? Can't we use a boolean (file_found) or an int
variable (file_count) to verify that we found the file.

8.
+$node->safe_psql('postgres', <<EOF);
+SELECT 'init' FROM
pg_create_physical_replication_slot('regress_pg_waldump_slot', true,
false);
+CREATE TABLE test_table AS SELECT generate_series(1,100) a;
+CHECKPOINT; -- required to force FPI for next writes
+UPDATE test_table SET a = a + 1;
+EOF
The EOF with append_conf() is being used in 4 files and elsewhere in
the TAP test files (more than 100?) qq[] or quotes is being used. I
have no strong opinion here, I'll leave it to the other reviewers or
committer.

> > 11.
> > +# verify filename formats matches w/--save-fpi
> > +for my $fullpath (glob "$tmp_folder/raw/*")
> > Do we need to look for the exact match of the file that gets created
> > in the save-fpi path? While checking for this is great, it makes the
> > test code non-portable (may not work on Windows or other platforms,
> > no?) and complex? This way, you can get rid of get_block_info() as
> > well? And +for my $fullpath (glob "$tmp_folder/raw/*")
> > will also  get simplified.
> >
> > I think you can further simplify the tests by:
> > create the node
> > generate an FPI
> > call pg_waldump with save-fpi option
> > check the target directory for a file that contains the relid,
> > something like '%relid%'.
> >
> > The above would still serve the purpose, tests the code without much complexity.
>
> I disagree; I think there is utility in keeping the validation of the
> expected output.  Since we have the code that works for it (and does
> work on Windows, per passing the CI tests) I'm not seeing why we
> wouldn't want to continue to validate as much as possible.

My intention is to simplify the tests further and I still stick to it.
It looks like the majority of test code is to form the file name in
the format that pg_waldump outputs and match with the file name in the
target directory - for instance, in get_block_info(), and in the loop
for my $fullpath (glob "$tmp_folder/raw/*"). I don't think the tests
need to aim for file format checks, it's enough to look for the
written file with '%relid%' by pg_waldump, if needed, the contents of
the files written/FPI can also be verified with, say, pg_checksum
tool. Others may have different opinions though.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Reducing power consumption on idle servers
Next
From: Daniel Gustafsson
Date:
Subject: Re: pg_basebackup's --gzip switch misbehaves