Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL - Mailing list pgsql-hackers

From Bharath Rupireddy
Subject Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Date
Msg-id CALj2ACU-_HixN-rz0tKsab6b3w-+3w733ceUW6oKSJ7Pgp2R+Q@mail.gmail.com
Whole thread Raw
In response to [PATCH] Teach pg_waldump to extract FPIs from the WAL  (David Christensen <david.christensen@crunchydata.com>)
Responses Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
List pgsql-hackers
On Sat, Apr 23, 2022 at 4:21 AM David Christensen
<david.christensen@crunchydata.com> wrote:
>
> Hi -hackers,
>
> Enclosed is a patch to allow extraction/saving of FPI from the WAL
> stream via pg_waldump.
>
> Description from the commit:
>
> Extracts full-page images from the WAL stream into a target directory,
> which must be empty or not
> exist.  These images are subject to the same filtering rules as normal
> display in pg_waldump, which
> means that you can isolate the full page writes to a target relation,
> among other things.
>
> Files are saved with the filename: <lsn>.<ts>.<db>.<rel>.<blk> with
> formatting to make things
> somewhat sortable; for instance:
>
> 00000000-010000C0.1663.1.6117.0
> 00000000-01000150.1664.0.6115.0
> 00000000-010001E0.1664.0.6114.0
> 00000000-01000270.1663.1.6116.0
> 00000000-01000300.1663.1.6113.0
> 00000000-01000390.1663.1.6112.0
> 00000000-01000420.1663.1.8903.0
> 00000000-010004B0.1663.1.8902.0
> 00000000-01000540.1663.1.6111.0
> 00000000-010005D0.1663.1.6110.0
>
> It's noteworthy that the raw images do not have the current LSN stored
> with them in the WAL
> stream (as would be true for on-heap versions of the blocks), nor
> would the checksum be valid in
> them (though WAL itself has checksums, so there is some protection
> there).  This patch chooses to
> place the LSN and calculate the proper checksum (if non-zero in the
> source image) in the outputted
> block.  (This could perhaps be a targetted flag if we decide we don't
> always want this.)
>
> These images could be loaded/inspected via `pg_read_binary_file()` and
> used in the `pageinspect`
> suite of tools to perform detailed analysis on the pages in question,
> based on historical
> information, and may come in handy for forensics work.

Thanks for working on this. I'm just thinking if we can use these FPIs
to repair the corrupted pages? I would like to understand more
detailed usages of the FPIs other than inspecting with pageinspect.

Given that others have realistic use-cases (of course I would like to
know more about those), +1 for the idea. However, I would suggest
adding a function to extract raw FPI data to the pg_walinspect
extension that got recently committed in PG 15, the out of which can
directly be fed to pageinspect functions or

Few comments:
1) I think it's good to mention the stored file name format.
+ printf(_("  -W, --raw-fpi=path     save found full page images to
given path\n"));
2)
+ for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
+ {
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
+
+ if (XLogRecHasBlockImage(record, block_id))

I think we need XLogRecHasBlockRef to be true to check
XLogRecHasBlockImage otherwise, we will see some build farms failing,
recently I've seen this failure for pg_walinspect..

    for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
    {
        if (!XLogRecHasBlockRef(record, block_id))
            continue;

        if (XLogRecHasBlockImage(record, block_id))
            *fpi_len += XLogRecGetBlock(record, block_id)->bimg_len;
    }
3) Please correct the commenting format:
+ /* we will now extract the fullpage image from the XLogRecord and save
+ * it to a calculated filename */
4) Usually we start errors with lower case letters "could not ....."
+ pg_fatal("Couldn't open file for output: %s", filename);
+ pg_fatal("Couldn't write out complete FPI to file: %s", filename);
And the variable name too:
+ FILE *OPF;
5) Not sure how the FPIs of TOASTed tables get stored, but it would be
good to check.
6) Good to specify the known usages of FPIs in the documentation.
7) Isn't it good to emit an error if RestoreBlockImage returns false?
+ if (RestoreBlockImage(record, block_id, page))
+ {
8) I think I don't mind if a non-empty directory is specified - IMO
better usability is this - if the directory is non-empty, just go add
the FPI files if FPI file exists just replace it, if the directory
isn't existing, create and write the FPI files.
+ /* we accept an empty existing directory */
+ if (stat(config.save_fpw_path, &st) == 0 && S_ISDIR(st.st_mode))
+ {
 9) Instead of following:
+ if (XLogRecordHasFPW(xlogreader_state))
+ XLogRecordSaveFPWs(xlogreader_state, config.save_fpw_path);
I will just do this in XLogRecordSaveFPWs:
    for (block_id = 0; block_id <= XLogRecMaxBlockId(record); block_id++)
    {
        if (!XLogRecHasBlockRef(record, block_id))
            continue;

        if (XLogRecHasBlockImage(record, block_id))
        {

        }
    }
10) Along with pg_pwrite(), can we also fsync the files (of course
users can choose it optionally) so that the writes will be durable for
the OS crashes?

Regards,
Bharath Rupireddy.



pgsql-hackers by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: variable filename for psql \copy
Next
From: Bharath Rupireddy
Date:
Subject: Re: pg_receivewal fail to streams when the partial file to write is not fully initialized present in the wal receiver directory